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ABSTRACT 


This  report  describes  the  third  cycle  in  the  continuing  research  to  develop 
cost  estimating  relationships  between  cost  and  cost  factors,  to  be  used  in 
the  management  of  computer  programming.  Several  features  of  the  work  are 
presented,  including  basic  assumptions  of  the  analyses,  definitions,  data 
collection  and  validation  procedures,  and  application  of  statistical  tech¬ 
niques  such  as  correlation  and  multivariate  regression  analyses. 

The  analysis  is  being  performed  with  169  data  points,  representing  computer 
programming  efforts  completed  by  System  Development  Corporation,  various 
industrial  organizations,  and  agencies  of  the  United  States  Air  Force. 

Several  characteristics  of  the  data  base  are  presented,  e.g.,  source,  size, 
range  of  selected  variables,  average  age  of  the  data  points,  and  applications 
and  computer  languages  used.  In  addition,  statistical  tests  were  performed 
to  ascertain  the  presence  of  subsamples  in  our  data;  the  results  of  these 
tests  are  also  presented. 

The  report  concludes  with  recommendations  for  the  collection  and  validation 
of  more  accurate  data,  as  well  as  for  general  improvements  in  the  approach 
and  methods  implemented  in  the  work. 
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SECTION  I 


INTRODUCTION 


1.  Purpose  of  the  Report.  This  report  describes  the  current  work  in  the 
third  cycle  of  analysis  to  derive  estimating  equations  for  the  costs  of 
computer  program  production.  In  the  first  two  cycles  of  this  work,  data 
from  computer  programming  efforts  completed  at  System  Development  Corporation 
(SDC)  were  analyzed.  This  third  cycle  includes,  in  addition,  data  from  other 
sources,  namely  l4  Air  Force  and  7  industrial  programming  organizations. 

The  purpose  of  the  work  in  this  cycle  is  twofold:  to  search  the  data  for 
significant  subsamples  that  are  more  homogeneous  in  terms  of  cost  and,  as  a 
result,  lead  to  estimating  equations  with  better  precision  than  that  exhibited 
by  equations  derived  from  the  entire  sample.  If  significant  subsamples  are 
identified,  it  is  expected  that  these  can  be  the  basis  for  more  thorough 
analyses  to  yield  still  further  improvements  in  cost  estimating  equations. 

The  analysis  in  this  third  cycle  is  not  complete,  e.g.,  no  estimating 
equations  have  been  derived  as  yet.  Such  results  will  be  included  in 
another  document  scheduled  to  appear  early  this  fall. 

This  report  includes  characteristics  of  the  data  base  such  as  the  sources  of 
the  data,  their  age  and  descriptive  statistics,  e.g.,  the  mean  and  standard 
deviation,  for  the  various  cost  measures,  production  rates  and  computer 
usage  rates  found  in  the  entire  sample  as  well  as  the  various  subsamples. 

The  subsamples  are  based  upon  division  of  the  sample  according  to  program¬ 
ming  language  (machine-oriented  versus  procedure-oriented),  programming 
application  (software,  business,  scientific,  other  computer  programs), 
and  computer  class  (large,  medium  or  small  as  characterized  by  cost). 

In  addition,  the  data  in  these  divisions  were  tested  to  see  if  they  indeed 
formed  subsamples  that  are  distinctly  different.  Using  cost  measures  such 
as  man  months  and  months  elapsed,  production  and  computer  usage  rates  as 
the  basis  for  comparison,  these  preliminary  tests  show  the  presence  of  some 
subsamples  in  the  data  and  also  confirm  previous  findings  of  our  research. 

For  example,  the  use  of  a  procedure-oriented  language  (POL)  as  compared  with 
the  use  of  a  machine-oriented  language  (MOL  or  assembly  language)  appears  to 
result  in  a  higher  production  rate  (machine  language  instructions  per  man 
month)  and  a  lower  computer  usage  rate  (computer  hours  per  thousand  machine 
language  instructions)  as  well  as  significant  differences  in  the  means  for 
the  basic  cost  measures.  The  statistical  tests  used  assume  that  the  data  are 
distributed  normally  and  that  the  variances  of  the  samples  compared  are  equal. 
Although  our  data  do  not  completely  conform  to  these  conditions,  we  neverthe¬ 
less  feel  that  the  results  presented  do  provide  an  insight  into  the  significant 
differences  in  the  samples  tested,  and  they  can  be  used,  with  caution,  along 
with  other  methods  of  cost  prediction  and  control. 


Further  analysis  of  the  data  is  continuing,  and  the  results  are  to  be  published 
later  this  year  in  the  form  of  a  Management  Handbook  which  will  also  contain 
other  management  aids  such  as  schedules,  charts,  checklists,  and  summaries  of 
pertinent  expert  opinions  regarding  computer  programming  management. 

Meanwhile,  this  report  is  intended  to  serve  as  the  following: 

.  A  technical  reference  for  the  Management  Handbook  (that  will  contain 
the  actual  cost-estimating  equations  derived  from  the  analysis)  by 
describing  the  methods  and  data  used  to  obtain  the  results. 

.  Feedback  to  the  several  organizations  that  supplied  inputs  to  this 
analysis  by  providing  a  complete  matrix  of  data  that  permits 
individual  contributors  to  compare  (or)  analyze  their  inputs  with 
other  data. 

.  A  statistical  description,  e.g.,  range,  mean,  standard  deviation  of 
the  data  for  the  costs,  production  rates,  and  computer  usage  rates  in 
each  hypothesized  subsample. 

.  An  exhibit  of  early  analytical  results  based  upon  statistical  tests 
to  determine  the  significance  of  the  divisions  into  subsamples. 

.  An  evaluation  of  the  feedback  received  and  results  to  date  and 
recommendations  for  modifying  the  present  assumptions  and  methods 
for  future  work. 

2.  Background.  Since  1964,  the  SDC  Programming  Management  Project  has  been 
performing  research  in  the  management  of  computer  programming.  This  report, 
one  of  a  series,  represents  work  done  under  the  sponsorship  of  the  Air  Force 
Electronic  Systems  Division,  Deputate  for  Engineering  and  Technology, 

Directorate  of  Computers. 

The  general  aim  of  the  Project  is  to  develop  techniques,  standards,  and  guide¬ 
lines  for  managers  of  computer  programming  and  buyers  of  the  resultant  products. 
In  the  work  for  ESD,  these  aids  have  been  in  the  form  of  linear  cost  estimating 
equations,  describing  relationships  between  costs  and  cost  factors  (variables 
thought  to  influence  computer  programming  costs). 

The  work  has  been  conducted  in  cycles,  this  report  describing  the  third  such 
cycle.  What  we  term  a  "research  cycle"  consists  of  the  following: 

.  Design  (or  redesign)  of  the  questionnaire  used  to  collect  the  data. 

.  Collection  of  data  that  characterize  completed  programming  efforts. 

.  Validation  of  these  data  by  identifying  anomalies  and  gaps  and  then 
coordinating  with  the  original  respondents  to  clarify  and  complete 
the  questionnaires. 
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.  Application  of  statistical  techniques,  intuition,  and  experience, 
first  to  reduce  the  total  number  of  cost  factors  to  be  considered  as 
independent  variables,  and  then  to  derive  the  equations.  This  is 
done,  for  example,  by  using  multiple  regression  and  predictor 
selection  algorithms  to  relate  the  remaining  cost  factors  as 
independent  variables  to  the  cost  measures  as  dependent  variables. 

The  first  cycle  explored  the  feasibility  of  using  multivariate  statistics  to 
derive  estimating  equations  by  using  27  data  points,  all  representing 
computer  programs  completed  within  SDC  (l).  The  second  cycle  repeated  the 
previous  analysis,  using  an  enlarged  data  base  of  7^  sample  points,  also 
representing  computer  programming  projects  completed  within  SDC.  That 
phase  of  the  research  resulted  in  the  development  of  several  task  indices, 
i.e.,  weighted  composites  of  two  or  more  cost  factors,  and  the  use  of 
Stanine  confidence  band  techniques  as  an  aid  to  estimating  computer  pro¬ 
gramming  costs  (2,  3)*  The  third  phase  of  the  research,  described  in  this 
report,  is  being  conducted  with  169  points  and  is  aimed  at  the  following: 

.  Deriving  equations  with  improved  accuracy  and/or  usefulness  by  using 
subsamples  based  upon  divisions,  such  as  types  of  computers  used  in 
program  production,  and  types  of  programming  applications,  e.g., 
business,  scientific,  etc. 

.  Extending  the  use  of  the  data  base  by  testing  a  series  of  hypotheses 
of  interest  to  management,  e.g.,  MOL/POL  comparison. 

.  Measuring  the  improvement  in  statistical  prediction  and  trying  to 
identify  paths  for  further  research. 

The  remainder  of  this  report  consists  of  the  following  sections: 

.  Section  II— a  description  of  data  characteristics,  e.g.,  source, 
size,  age,  etc.,  and  a  preliminary  report  on  the  results  of  the 
subsample  tests  being  performed  with  the  data. 

.  Section  III — a  summary  of  the  deficiencies  of  the  research  model  and 
the  methods  used  in  the  work  to  date,  with  recommendations  for  changes 
in  both  for  future  work. 

.  Section  IV — appendices,  which  contain: 

A.  A  review  of  the  research  model  for  the  analysis. 

B.  An  outline  of  the  data  collection  and  validation  procedures  used 
to  form  our  data  base. 

C.  A  description  of  the  statistical  methods  applied  to  the  data. 
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D.  A  data  matrix  representing  the  responses  of  the  agencies  who 
contributed  data  to  our  sample.  This  appendix  includes  the 
definition  and  coding  of  all  items  used  in  the  questionnaire. 

E.  A  list  of  variables  that  were  formed  by  combining  or  transforming 
some  of  the  items  that  make  up  the  data  collection  questionnaire. 

F.  A  list  showing  the  correlation  coefficients  of  selected  variables 
with  the  four  cost  measures — man  months,  computer  hours,  months 
elapsed,  object  instructions  generated,  and  the  formed  variables — 
object  production  rate  and  object  computer  usage  rate. 
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SECTION  II 


DATA.  CHARACTERISTICS  AND  PRELIMINARY  SUBSAMPLE  ANALYSIS 


1.  Introduction.  The  data  described  in  this  Section  represent  the  combined 
sample  after  the  completion  of  the  third  collection  effort  conducted  by  the 
Programming  Management  Project. 

At  the  conclusion  of  the  third  effort,  106  points  had  been  collected,  66  points 
from  7  industrial  firms  and  40  points  from  14  agencies  of  the  United  States 
Air  Force.  These  106  points  combined  with  the  SDC  data  on  hand — 74  points — 
resulted  in  an  initial  data  base  of  180  points.  During  validation  of  the  data, 
five  external,  sample  points  were  dropped  because  of  the  general  unreliability 
of  the  data.  Eventually,  6  more  points — all  SDC  data — were  dropped  in  the 
early  stages  of  statistical  analysis  because  they  were  considered  to  be  too 
large  and  hence  unrepresentative  in  terms  of  cost,  e.g.,  1653  man  months.  As  a 
result,  169  data  points  were  used  as  input  to  the  statistical  analysis. 

Appendix  B  describes  the  collection  and  validation  of  these  data  as  well  as 
the  deletion  of  the  data  points. 

The  first  parts  of  this  Section  describe  the  entire  data  base  in  terms  of  data 
sources,  age  of  the  data,  and  descriptive  statistics.  The  second  part  of  this 
Section  deals  with  similar  characteristics  for  various  subsamples  of  the  data 
and  also  presents  the  results  of  statistical  tests  on  the  means. 

The  larger  size  of  the  data  base  permitted  us  to  continue  the  subsample 

identification  and  testing  that  was  initiated  in  previous  cycles  of  our  work 
(2,3).  The  subsamples  that  were  proposed  for  this  analysis  were  grouped  into 
categories  based  on  programming  application,  source  language ,  developmental 
computer  size  and  interprogram  communication. 

Descriptive  statistics,  e.g.,  mean,  standard  deviation,  were  derived  for  each 
proposed  group,  and  then  used  to  test  for  the  presence  of  subsamples  by  means 
of  significance  tests.  The  descriptive  statistics  for  each  group  along  with 
the  results  of  the  tests,  are  presented  in  table  form. 

2.  Sources  of  the  Data  Base.  The  present  data  base  of  169  data  points 

represents  131  programming  efforts  completed  at  8  industrial  organizations 
(including  SDC)  and  38  points  submitted  by  14  agencies  of  the  United  States 
Air  Force.  The  eight  industrial  organizations  include  four  companies  whose 
main  function  is  software  design  and  production,  and  four  whose  primary 
endeavor  is  hardware  development.  The  14  USAF  programming  agencies  are  support 
organizations  in  larger  commands. 

Table  I  illustrates  each  organization's  contribution  to  the  data  base. 
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TABIE  I 


DISTRIBUTION  OF  DATA.  BY  SOURCE 


Organization 

Number 

of  Data  Points 
Submitted 

1 

U.  S.  Air  Force* 

38 

Industry 

Computer 
Software 
Research  and 
Development 

Company  A 

Company  B 

Company  C 

Company  D 

6 

1 

1 

69 

Computer 
Hardware  and 

Aerospace 

Company  E 

Company  F 

Company  G 

Company  H 

2 

3 

21 

28 

Tot) 

al  169 

*Note:  Data  represent  l4  separate  USAF  organizations 


3-  Age  of  the  Data  Base.  One  of  the  most  apparent  features  of  the  electronic 
data  processing  field  is  the  rapidity  of  change.  We  have  seen  such  swift 
developments  in  computer  hardware  that  early  obsolescence  is  taken  for  granted. 
Although  this  obsolescence  does  not  seem  to  be  prevalent  in  software  develop¬ 
ment,  we  do  find  significant  year-to-year  advances  in  the  availability  of 
software  packages  and  compiler  design.  Therefore,  the  currency  of  the  data 
should  be  taken  into  account  in  the  interpretation  of  analysis  results,  since 
the  time  interval  to  validate  and  analyze  the  data  (see  Appendices  B  and  C), 
ranges  from  nine  months  to  a  year.  In  effect,  the  resultant  equations  reflect 
programming  as  it  was  two  years  ago,  and  it  may  be  quite  different  today. 

Table  II  lists  the  starting  dates  of  the  computer  programs  in  our  sample  and 
the  average  age  of  the  data. 
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TABIE  II 

AGE  OF  THE  DATA  BASE 


Data  Point 
Programming 
Start  Date 

Age  of 

Data  Point 
(Years) 

Number 

of 

Points 

Percent 

of 

Total 

1965 

•5 

36 

21.3 

1964 

1.5 

72 

42.6 

1963 

2.5 

27 

16.0 

1962 

3-5 

16 

9.5 

1961 

4.5 

17 

10.1 

i960 

5.5 

1 

•5 

Total 

Average  Age  of  Data  Base:  c 

169 

2.0  Years 

100.0 

4.  Size  of  Programs  in  Sample.  The  histograms  for  two  cost  measures, 

Man  Months  and  Months  Elapsed  (Figures  1  and  2),  illustrate  the  general 
distribution  of  the  data  in  terms  of  man  months  and  months  elapsed.  The 
range  of  the  programs  is  quite  wide,  but  the  smaller  programs  dominate 
the  distribution  of  our  sample.  This  seems  to  indicate  that  programs 
requiring  an  approximate  production  time  of  5  to  15  man  months  and  2  to  8 
elapsed  months  are  the  most  common. 

No  histograms  were  constructed  for  computer  hours  and  number  of  object 
instructions,  due  to  the  extreme  range  of  these  variables.  However, 

Table  III  contains  the  range,  mean,  and  standard  deviation  for  the  cost 
measures,  production  rates  and  computer  visage  rates  for  the  total  sample. 
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Figure  1. 
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Figure  2.  Histogram  of  Months  Elapsed 


TABLE  III 


STATISTICS  OF  COST  MEASURES,  PRODUCTION  RATES  AND 
COMPUTER  USAGE  RATES  FOR  TOTAL  SAMPLE,  N  =  169 


Max 

Min 

Mean 

S.D. 

Months  Elapsed 

36 

1 

9.2 

7.3 

Computer  Hours 

2650 

1 

237 

464 

Number  of  Object  Instructions 
Generated 

217000 

150 

12152 

21682 

Man  Months 

300 

1 

hO 

62 

Source  Production  Rate  (Source 
Instructions  per  Man  Month) 

6500 

10 

475 

787 

Object  Production  Rate  (Object 
Instructions  per  Man  Month) 

13889 

10 

970 

1820 

Source  Computer  Usage  Rate 
(Computer  Hours  per  1000  Source 
Instructions) 

331 

.23 

31 

^5 

Object  Computer  Usage  Rate 
(Computer  Hours  per  1000  Object 
Instructions) 

29^ 

.05 

24 

38 

5.  Subsample  Descriptions.  In  the  previous  two  cycles  of  our  work,  the 
statistical  methods  used  to  derive  estimating  equations  were  applied  to  the 
total  data  base  without  any  attempt  to  distinguish  between  program  character¬ 
istics,  such  as  application,  source  language,  etc.,  although  some  initial 
attempts  were  made  to  develop  estimating  equations  for  subsamples  established 
by  restricting  the  ranges  of  the  cost  measure,  man  months  (3). 

In  this  third  cycle,  the  analysis  of  subsamples  is  the  central  issue  in  trying 
to  derive  improved  estimating  equations.  Four  categories  of  subsamples  were 
selected  for  analysis  based  on  (a)  the  fact  that  these  factors  were  commonly 
hypothesized  as  affecting  programming  costs,  and  (b)  the  availability  of  data 
in  our  sample.  The  subsamples  that  were  proposed  for  study  in  this  analysis 
were  grouped  into  the  following  categories: 

.  Programming  Application 

.  Production  Source  Language 

.  Production  Computer  Size 

.  Interprogram  Communications 

Each  of  these  is  defined  in  detail  below. 
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a.  Programming  Application  consists  of  the  following  types  of  programs: 


(1)  Business  applications,  where  storage  and  retrieval  with  large  files 
of  data  were  the  predominant  operations.  Applications  include  inventory  control, 
financial  report  preparation,  customer  billing,  and  payroll  calculations. 

(2)  Scientific  applications,  where  procedure-oriented  calculations  were 
made  on  relatively  small  data  input,  for  example:  geodetic  studies,  network 
adjustments,  and  statistical  analyses. 

(3)  Utility  and  support,  where  the  computer  programs  developed  were  to 
be  used  as  tools  for  support  of  computer  operations  and  programming  per  se,  for 
example:  compilers,  executive  routines,  master  tape  generators,  and  data 
conversion  for  use  on  different  types  of  hardware. 

(4)  Other  applications,  where  on-line,  real-time  computation  and  response 
were  the  primary  operations,  e.g.,  satellite  control  and  tracking  programs. 

Table  IV  shows  the  data  distribution  by  application  and  contributor. 

TABIE  IV 

DISTRIBUTION  OF  DATA  BY  PROGRAMMING  APPLICATION 


Type  of  Program 

Total 

Computer 

Data 

Points 

Business 

Scientific 

Software 

Other 

Govt 

U.  S.  Air  Force* 

38 

26 

10 

2 

Company  A 

6 

3 

3 

u  <D  a  § 

0)  U  S 

-P  3  43  R 

Company  B 

1 

1 

2  >  O  0 

0  O  <D  £ 
g| 

Company  C 

1 

1 

0 

& 

-p 

3 

Company  D 

69 

IT 

12 

5 

35 

t 

H 

Company  E 

2 

2 

U  S  <D 

<D  O 

-P  £  OJ 

Company  F 

3 

2 

1 

P<  «  w 

0  -B  2 

Company  G 

21 

19 

2 

28 

Company  H 

11 

IT 

Total 

169 

79 

27 

28 

35 

♦Note:  Data  represent  lA  separate  USAF  organizations. 
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b.  Production  Source  language.  Consists  of  programs  using  the  following 
source  languages: 

(1)  Procedure-Oriented  language  (POL) — a  computer-independent  language 
that  describes  how  the  process  of  solving  the  problem  is  to  be  carried  out.1 

(2)  Machine- Oriented  language  (MOL) — a  language  designed  for  interpre¬ 
tation  and  use  by  a  specific  computer. 1 

Of  the  I69  data  points,  46  were  written  in  POL  and  123  were  produced  using  MOL. 

Several  POLs  were  implemented,  such  as  JOVIAL,  COBOL,  FORTRAN,  ALGOL,  GECOM  and 
ALTAC.  The  source  instructions  used  to  produce  the  123  programs  coded  in  MOL 
represent  the  various  machine  languages  corresponding  to  the  production  computers 
listed  in  Table  VI. 

Table  V  illustrates  the  distribution  of  the  Programs  coded  in  POL  and  MOL. 

TAB  IE  V 

DISTRIBUTION  OF  DATA  BY  PRODUCTION  SOURCE  LANGUAGE 


Production  Source  Language 

Total 

Data 

POLs 

MOLs 

ORGANIZATION 

Other 

Auto¬ 

Machine 

Points 

JOVIAL 

COBOL 

FORTRAN 

POLs 

coder 

Language 

Govt 

U.  S.  Air  Force* 

38 

1 

3 

k 

6 

13 

11 

Computer 
Software 
jsearch  and 
evelounent 

Company  A 

Company  B 

Company  C 

6 

1 

1 

1 

6 

1 

Industry 

D3  0 

Company  D 

69 

15 

5^ 

1 

jj  3  ® 

Company  E 

2 

1 

1 

3  s  & 

ft  w  W 

§•5  2 

“a* 

Company  F 

3 

3 

Company  G 

21 

6 

1 

3 

11 

Company  H 

28 

3 

2 

6 

17 

Total 

169 

16 

12 

8 

10 

19 

104 

♦Note:  Data  represent  lU  separate  USAF  organizations. 


“^Glossary  of  Data  Processing  and  Communications  Terms,  Honeywell  Information 
Services,  Wellesley  Hills,  Massachusetts,  1964. 
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TABLE  VI 

PRIMARY  COMPUTERS  USED  IN  THE  PRODUCTION  OF  THE  DATA  POINTS 


Manufacturer 

Primary 

Development 

Computer 

No.  of  Data  Points 
on  which  Computer 
was  Used 

Autonetics 

Recomp  II 

1 

Burroughs 

220 

1 

825 

5 

Control  Data 

160A 

15 

Corporation 

i6o4  a/b 

9 

3600 

4 

General  Electric 

225 

11 

235 

2 

425 

2 

IBM 

360/30 

7 

l4oi 

9 

1410 

13 

l44o 

1 

7010 

1 

704o 

14 

7044 

3 

7080 

8 

7090 

7 

7094 

11 

FSq/7* 

17 

FSq/8* 

3 

FSQ/32* 

7 

Digital  Equipment 

PDP  1 

1 

Corporation 

Philco 

2000-210 

6 

212 

2 

RCA 

301 

5 

501 

1 

UNIVAC 

1107 

3 

*Note:  These  computers  were  specially  designed  for  military  command  and  control 
systems,  although  they  can  be  used  for  many  programming  applications. 
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c.  Production  Computer  Size.  Based  on  equi valent  purchase  cost,  using  the 
following  size  classifications: 

(1)  large — machines  costing  $750>000  or  more 

(2)  Medium — machines  costing  over  $100,000  but  less  than  $750,000 

(3)  Small — machines  costing  less  than  $100,000 

Table  VI  lists  the  28  primary  computers  used  in  producing  the  computers  in  the 
data  base.  In  several  cases,  more  than  one  machine  was  used  in  program 
developments;  however,  the  above  division  only  considered  the  computer  that 
was  used  for  the  major  portion  of  the  programming  effort. 

d.  Program  Communications.  Attempts  to  distinguish  between  programs  that 
were  produced  as  parts  of  a  larger  system,  requiring  program  interfaces,  and 
stand-alone  programs  requiring  no  interfaces  with  other  programs.  One  hundred 
nineteen  points  represented  computer  programs  that  were  produced  as  parts  of 
larger  systems.  For  example,  a  simple  payroll  program  was  considered  a  "stand¬ 
alone"  program,  while  the  graphic  display  portion  of  a  command  and  control 
system  was  categorized  as  part  of  a  larger  system. 

6.  Tests  of  Subsample  Statistics.  After  dividing  the  data  points  into  these 
subsamples,  descriptive  statistics — the  mean,  standard  deviation,  the  minimum, 
the  maximum — were  derived  for  the  cost  measures,  as  well  as  the  source  and 
object  production  rates— instructions  per  man  month— and  source  and  object 
computer  usage  rates— computer  hours  per  1000  instructions. 

2 

These  descriptive  statistics  were  then  compared  by  applying  t  and  F  tests 
to  assess  the  significance  of  differences  between  the  means  of  the  various 
subsamples  or  divisions  of  our  data. 

The  results  of  the  t  and  F  tests  for  program  applications,  production  source 
language  and  development  computer  size  are  tabulated  in  Tables  VII,  VIII,  and 
IX  respectively.  Each  table  contains  the  range,  mean,  and  standard  deviation 
for  the  cost  measures,  production  rates,  and  computer  usage  rates.  The  means 
for  each  variable  were  examined  by  the  significance  tests  and  the  resultant 


2 

The  t  and  F  tests  are  used  to  determine  the  statistical  reliability  of  the 
observed  difference  between  the  means  of  selected  groups.  The  t  distribution 
is  used  to  test  the  significance  between  two  means;  the  F  distribution  is  used 
when  more  than  two  observed  means  are  to  be  examined.  For  a  detailed  descrip¬ 
tion  of  these  tests  see  Dixon,  W.  J.  and  F.  J.  Massey.  Introduction  to 
Statistical  Analysis,  New  York,  Me  Graw-Hill,  1957  and  Hays,  W.  L.,  Statistics 
for  Psychologists,  New  York,  Holt,  Rinehart  and  Winston,  1963,  Chapters  10  and  11; 
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TABLE  VII 


PROGRAMMING  APPLICATIONS: 

STATISTICS  OF  COST  MEASURES,  PRODUCTION  RATES,  AND  COMPUTER  USAGE  RATES 


Months 

Elapsed 

Computer  Hours 

Object  Instr. 

Man  Months 

Programming 

Applications 

#  Of 
Pts. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Business 

79 

36 

i 

6.4 

6.2 

2100 

1 

73 

250 

217000 

200 

11022 

28027 

185 

1 

13.2 

26.2 

Scientific 

27 

21 

i 

8.7 

6.0 

850 

1 

137 

224 

58300 

217 

13560 

16537 

232 

1 

42.0 

67.2 

Utility  and 
Support 

28 

27 

3 

16.2 

7.4 

2650 

5 

766 

730 

50000 

1000 

18528 

13673 

260 

1 

92.7 

63.3 

Other 

35 

29 

4 

10.3 

6.6 

1420 

2 

263 

395 

56000 

150 

8481 

10984 

300 

1 

54.7 

81.6 

Significance  of 
Population  Means 

P  < 

011 

P  < 

.01 

p  <  .1 

05 

p  <  .01 

\-n 


Source 

Instr. /MM 

Object  Instr. /MM 

Object  Computer 
Usage  Rate 

Source  Computer 
Usage  Rate 

Programming 

Applications 

#  of 
Pts. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Business 

79 

6500 

21 

679 

1067 

13889 

26 

1521 

2388 

80 

.23 

12 

18 

115 

.23 

21 

27 

Scientific 

27 

1744 

26 

368 

428 

7250 

83 

882 

1479 

l4o 

.25 

18 

28 

211 

.25 

31 

51 

Utility  and 
Support 

28 

2000 

10 

294 

378 

2055 

10 

4l0 

558 

294 

1.10 

57 

68 

331 

2.53 

63 

74 

Other 

35 

669 

4l 

256 

1 66 

1527 

56 

292 

267 

129 

3.86 

30 

28 

129 

3.86 

32 

28 

Significance  of 
Population  Means 

P  < 

.01 

P  < 

.01 

P  <  .< 

01 

P  < 

.05 

*  probability  of  falsely  rejecting  the  hypothesis  that  the  population  means  are  equal,  e.g. ,  p  <  .01,  indicates 
that  if  the  hypothesis  is  rejected,  it  will  be  done  incorrectly  less  than  1  out  of  100  times. 


TABLE  VIII 


MACHINE-  AND  PROCEDURE-ORIENTED  LANGUAGE: 

STATISTICS  OF  COST  MEASURES,  PRODUCTION  RATES,  AND  COMPUTER  USAGE  RATES 


#  Of 

Pts. 

Months  Elapsed 

Computer  Hours 

Object  Instr 

• 

Man  Months 

Production . 
Language 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Machine-Oriented 

Language 

123 

29 

1 

9.6 

6.9 

2650 

1 

289 

499 

82000 

150 

11024 

14521 

300 

1 

48 

68 

Procedure-Oriented 

Language 

46 

36 

1 

8.2 

8.3 

2100 

2 

99 

317 

217000 

217 

15231 

34445 

185 

1 

18 

3k 

Significance  of 
Population  Means 

N.  ; 

s.2 

P  < 

.011 

N. 

s. 

P  < 

:  .01 

#  Of 
Pts. 

Source 

Instr. /MM 

Object  Instr. /MM 

Source  Computer 
Usage  Rate 

Object  Computer 
Usage  Rate 

Production 

Language 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Machine-Oriented 

Language 

123 

6500 

10 

511 

854 

7250 

10 

610 

1069 

331 

.25 

31 

46 

294 

•50 

30 

^3 

Pr oc  edur e-Or i ent ed 
Language 

46 

3667 

22 

389 

579 

13889 

105 

1977 

2838 

211 

1.20 

32 

43 

53 

.30 

10 

14 

Significance  of 
Population  Means 

N. 

.  S. 

P  < 

.05 

N. 

S. 

P  < 

.01 

xp  =  probability  of  falsely  rejecting  the  hypothesis  that  the  population  means  are  equal,  e.g.,  p  <  .01,  indicates 
that  if  the  hypothesis  is  rejected,  it  vill  be  done  incorrectly  less  than  1  out  of  100  times. 

2 

N.  S.  =  not  significant  at  the  5  percent  level. 


TABLE  IX 


DEVELOPMENT  COMPUTER  SIZE1: 

STATISTICS  OF  COST  MEASURES,  PRODUCTION  RATES,  AND  COMPUTER  USAGE  RATES 


Months  Elapsed 

Computer  Hours 

Object  Instr. 

Man  Months 

Development  1 
Computer  Size-1 

#  of 
Pts. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Large 

105 

36 

1 

10.6 

8.1 

2650 

2 

307 

520 

83336 

3.5° 

13999 

15807 

300 

1 

54 

71 

Medium 

53 

27 

1 

7-0 

5-4 

1918 

1 

127 

350 

217000 

200 

10048 

31185 

169 

1 

16 

32 

Small 

11 

17 

1 

6.6 

4.5 

633 

1 

107 

186 

35000 

335 

4825 

10109 

64 

2 

13 

19 

Significance  of 
Population  Means 

P  <  .012 

P  < 

.05 

P  <  .05 

P 

<  .01 

Source  Instr./MM 

Object  Instr./MM 

Source  Computer 
Usage  Rate 

Object  Computer 
Usage  Rate 

Development  ^ 
Computer  Size 

#  Of 
Pts. 

Max 

MLn 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Max 

Min 

Mean 

S.D. 

Large 

105 

5200 

10 

394 

658 

13889 

10 

950 

1926 

331 

.23 

36 

51 

294 

.25 

27 

43 

Medium 

53 

6500 

677 

1036 

8o4o 

60 

1153 

1791 

178 

1.25 

22 

33 

178 

.31 

18 

30 

Large 

11 

815 

26 

320 

227 

1075 

25 

380 

336 

so 

1.11 

32 

24 

80 

1.10 

31 

25 

Significance  of 
Population  Means 

N. 

S.3 

p  <  .01 

N. 

S. 

N. 

s. 

1Smal 1 — purchase  price  under  $100,000. 

Medium — purchase  price  between  $100,000  and  $750,000. 
Large— purchase  price  over  $750,000. 


p 

p  »  probability  of  falsely  rejecting  the  hypothesis  that  the  population  means  are  equal,  e.g.,  p  <  .01,  indicates 
that  if  the  hypothesis  is  rejected,  it  will  be  done  incorrectly  less  than  1  out  of  100  times. 

“3 

JN.  S.  ■  not  significant  at  the  5  percent  level. 
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probability  level  of  significance"'  is  shown  below  the  appropriate  costs  and 
rates.  If  the  tests  indicated  no  significant  difference  in  the  observed 
means,  the  notation  N.  S.  whs  placed  under  the  specific  variable. 

No  table  was  compiled  for  the  Program  Communication  subsample  since,  with  the* 
exception  of  man  months,  the  means  of  the  variables  were  not  significantly 
different.  The  tests  did  show  that  programs  produced  as  entities  in  themselves 
took  approximately  one  half  the  production  effort  in  terms  of  man  months  than 
programs  produced  as  part  of  larger  systems.  In  spite  of  these  preliminary 
results,  we  still  feel  that  subsampling  based  on  interprogram  communication 
is  an  important  factor  to  be  considered  in  cost  estimation  of  computer 
programming,  and  further  study  may  eliminate  the  reasons  for  the  initial 
lack  of  statistical  confirmation  in  this  analysis. 

Based  on  our  sample,  the  tests  indicate  the  following: 

.  POLs  (see  Table  VIII )  are  more  effective  than  MOLs  in  terms  of  man 
months,  computer  hours,  object  instructions  per  man  month  and  computer 
hours  per  1000  object  instructions.  These  results  confirm  earlier 
findings  derived  with  a  smaller  sample  in  the  second  cycle  (2). 

.  The  average  POL  expansion  ratio  is  three  to  four  MOL  instructions  per 
one  POL  instruction;  this  also  substantiates  earlier  findings. 

.  Software  programs  (see  Table  VIl)~ the  tools  of  programming  such  as 
compilers,  executives,  and  utility  routines— require  more  production 
time  (months  elapsed)  and  effort  (man  months  and  computer  hours)  than 
other  applications. 

.  Business  applications  (see  Table  VII ),  on  the  other  hand,  appear 
to  be  less  difficult  to  produce,  for  example,  measured  in  terms  of 
production  rate,  and  require  less  computer  hours  than  the  other 
applications . 

These  early  results  appear  to  have  strong  statistical  significance.  For 
example,  the  probability  that  such  differences  could  occur  by  chance  are  in 
some  cases  less  than  one  in  one  hundred  and  in  other  cases  less  than  five  in 
one  hundred.  These  tests,  the  t  and  the  F,  should  be  made  on  the  means  of 
variables  with  symmetrical  distributions  that  approach  the  normal  bell- shaped 
curve  and  that  have  approximately  euqal  variances.  The  data  being  analyzed 


3The  results  of  the  t  and  F  tests  are  given  in  the  form  of  the  probability  of 
falsely  rejecting  the  hypothesis  that  the  means  of  the  subsamples  are  equal. 
Thus,  a  p  <  .01  means  that  if  the  hypothesis  is  rejected,  it  will  be  done 
incorrectly  less  than  1  out  of  100  times. 
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in  this  cycle  show  some  skewness  or  nonsyrametry  and  outliers  may  have  to  he 
deleted  in  some  case  or  winsorized  in  others,  i.e.,  replaced  by  an  artificial 
piece  of  data  that  is  more  representative,  if  further  study  indicates  that 
such  outliers  significantly  bias  the  analysis. 

A^Lso,  in  interpreting  these  results  it  should  be  noted  that  the  t  and  F  tests 
can  only  assess  existing  differences  in  the  means  of  the  selected  subsamples; 
they  in  no  way  indicate  the  nature  of  these  differences.  They  may  be  due  to 
a  variety  of  causes,  ranging  from  sampling  error  to  other  underlying  factors. 
For  example,  part  of  the  difference  detected  in  the  means  for  Business  and 
Software  applications  may  be  due  to  the  heavy  use  of  POLs  in  Business 
applications  and  the  common  use  of  MOLs  in  developing  Software  programs. 
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SECTION  III 


EVALUATION  OF  WORK  TO  DATE  AND  RECOMMENDATIONS  FOR  FUTURE  WORK 


During  the  second  cycle,  we  began  to  evaluate  the  methods  used  and  the  results 
obtained  in  developing  guidelines  for  improvements  in  the  future  work.  We 
have  not  tested  the  derived  equations,  i.e.,  we  have  not  used  them  to  estimate 
costs  before  production  of  a  large  number  of  computer  programs ,  and  then,  after 
their  completion  to  compare  these  estimates  with  actual  costs.  Therefore, 
this  evaluation  is  based  upon  (l)  feedback  from  readers  of  our  reports, 

(2)  the  statistics  that  measure  the  expected  errors  in  the  estimates 
calculated  by  the  equations,  (3)  a  quality  assessment  of  the  data  used, 
and  (4)  reflection  of  the  methods  used  and  the  work  sequence.  We  plan 
to  continue  this  evaluation  during  the  remainder  of  the  third  cycle. 

Although,  at  present,  we  have  no  standards  to  assess  what  level  of  accuracy 
is  needed  or  attainable  for  such  estimating  equations,  we  feel  the  standard 
errors  of  estimate,  the  measures  of  estimating  precision  for  the  equations 
derived  (in  the  second  cycle)  are  too  large.  For  example,  the  ratio  of 
expected  actual  costs  to  estimated  costs  can  be  as  large  as  100  percent  in 
some  cases.  Further,  some  of  the  cost  factors  in  the  equations,  although 
statistically  significant,  do  not  have  strong  intuitive  appeal. 

The  third  cycle  may  yield  equations  with  more  intuitive  appeal  and  estimating 
precision.  To  try  to  obtain  more  appealing  variables  in  the  equations,  we 
are  changing  the  sequence  of  analysis  by  first  deriving  equations  to  estimate 
the  size  and  number  of  instructions  of  computer  programs  and  then  using  these 
relationships  to  form  independent  variables  to  appear  in  the  equations  for 
man  months  and  computer  hours.  Also,  in  an  attempt  to  reduce  the  standard 
errors  of  estimate,  we  expect  to  derive  these  equations  for  subsamples  whose 
costs  would  be  more  homogeneous. 

We  have  some  doubts  about  the  reliability  of  the  new  data.  Despite  the 
lengthy  effort  to  validate  the  data,  we  feel  some  of  the  terms  used  in  the 
questions  were  still  not  interpreted  in  a  uniform  manner  by  all  respondents. 

There  are  several  positive  aspects  of  the  results  to  date.  Aside  from  the 
catalytic  effects  of  stimulating  an  analytic  approach  to  the  cost  question 
and  promoting  the  need  to  identify  and  collect  costs  expended  for  programming 
products,  the  results  to  date  provide  the  following  benefits: 

.  A  checklist  of  cost  factors  to  consider  in  estimating  costs. 

.  A  large  collection  of  numerical  data  for  various  types  of  programming 
efforts  that  can  be  used  for  comparison  with  estimated  or  actual  costs. 
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.  Bivariate  relationships  between  cost  measures,  such  as  man  months 
versus  computer  hours,  that  can  help  managers  make  estimates  and 
compare  actual  costs. 

.  Estimating  equations  that  supply  a  systematic  way  for  estimating 
costs  and  that  are  suitable  for  comparison  with  estimates  made  by 
other  techniques. 

.  Probabalistic  confirmation  or  denial  of  various  hypotheses  (such  as 
those  in  Section  II)  by  using  numerical  data  on  the  impact  of  types 
of  tools  and  applications  on  costs  and  resource  usage  rates. 

We  now  identify  some  probable  causes  for  the  lack  of  appeal  and  accuracy  in 
the  results  to  date. 

1.  Growing  Dimensions  of  ADP.  One  explanation  for  the  large  variation  in 
our  data  is  the  rapid  growth  of  ADP  that  can  be  represented  in  many  dimensions 
that  appear  to  influence  costs.  For  example,  consider  the  increasing  range 
of  computer  design  and  power,  computer  configurations,  programming  languages, 
computer  programs  for  support,  programmer  proficiency  and  experience,  and 
types  of  applications.  If  indeed  these  influence  cost  as  has  been  hypoth¬ 
esized,  then  their  increasing  range  should  increase  the  range  of  cost  data. 

2.  The  Questionnaire.  The  mail-out  questionnaire  has  been  the  primary  data 
collection  tool  used  in  our  work.  Appendix  D  shows  the  items  in  the  current 
version.  Both  the  mechanics  of  its  use  and  its  contents  have  shown  several 
shortcomings  as  follows: 

.  Despite  two  modifications  in  the  second  and  third  cycles  based  upon 
feedback,  experience,  and  analytical  results,  the  items  used  still 
lack  precise,  commonly  understood  definitions  that  would  permit 
consistent  answers. 

.  Although  our  plan  was  to  reduce  the  number  of  items  in  the  question¬ 
naire  in  each  revision,  the  number  of  items  has  remained  about  the 
same— approximately  90.  In  the  first  cycle,  the  sample  size— 27  data 
points— was  too  small  for  us  to  confidently  discard  items,  so  the 
questionnaire  for  the  second  cycle,  although  revised,  remained  about 
the  same  size.  Unfortunately,  to  take  advantage  of  the  opportunity 
to  collect  the  data  from  Air  Force  and  industrial  organizations,  we 
had  to  distribute  the  questionnaire  for  the  third  cycle  before  we 
completed  the  analysis  in  the  second  cycle;  so  again  we  could  not 
streamline  the  questionnaire.  As  a  result,  the  current,  rather 
lengthy  questionnaire  discourages  quick  accurate  responses  and 
requires  extensive  data  handling  and  extra  analytical  procedures 
to  arrive  at  estimating  equations. 
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3.  Data  Validation.  The  validation  procedure,  described  in  Appendix  B,  is 

a  time-consuming  process  that  was  aggravated  by  the  size  and  structure  of  the 
questionnaire  and  the  method  by  which  it  was  administered.  Communication 
with  project  members  and  individual  respondents  was  minimal  at  the  time  the 
questionnaires  were  completed,  because  we  only  dealt  with  a  central  contact 
in  each  contributing  organization.  Thus,  inevitable  misunderstandings  led 
to  a  lengthy  process  of  validation  to  clarify  the  data. 

The  validation  was  not  completely  successful.  In  some  cases,  the  original 
respondent  was  no  longer  available  at  the  time  of  validation,  and  a  lengthy 
search  had  to  be  initiated  to  locate  the  person.  In  other  instances,  the 
data  needed  to  rectify  certain  questionnable  items  could  not  be  found,  and 
the  data  point  had  to  be  discarded. 

4.  Sampling.  It  is  doubtful  that  the  data  sample  obtained  so  far  is  a 
representative  one.  Defining  a  suitable  statistical  sample  that  can  serve 
as  a  basis  for  generalizing  the  results  to  the  entire  population  of  computer 
programming  jobs  is  a  very  difficult  problem.  The  characteristics  of  this 
population  are  not  well  defined  and,  as  indicated  earlier,  its  dimensions 
are  changing  rapidly. 

The  following  recommendations  are  put  forth  as  some  possible  solutions  to 
the  above  problems: 

.  Renew  the  research  effort  to  define  and  describe  computer  programming 
jobs  to  determine  more  meaningful  predictor  variables  for  assessing 
program  complexity,  difficulty,  and  size. 

*  Improve  the  definition  of  a  data  point  (l)  to  differentiate  between 
runs,  subprograms,  subroutines,  programs,  and  program  systems,  and 
(2)  to  define  beginning  and  end  points  more  precisely  for  the  phases 
of  programming  included  in  a  data  point. 

.  Collect  and  analyze  data  for  a  subset  of  programming  jobs.  For 
example,  the  preliminary  subsample  study  described  in  Section  II  of 
this  report  indicates  that,  based  on  the  cost  measures,  production 
rates  and  computer  usage  rates,  MOL/POL  and  programming  applications 
are  likely  candidates  for  further  study.  Restricting  the  collection 
and  analysis  to  such  subsets  of  programming  activities  would  result 
in  a  reduction  in  the  variation  of  collected  costs  and  cost  factors, 
and  consequently,  reduce  the  data  anomalies  produced  by  a  collection 
procedure  that  entails  various  computer  programs  and  programming 
procedures. 

.  Before  attempting  widespread  use  of  the  questionnaire,  test  it 

thoroughly  by  means  of  trial  interviews  with  programmers  and  program¬ 
ming  managers.  This  would  serve  a  dual  purpose:  (l)  highlight 
existing  inconsistencies,  and  (2)  indicate  the  factors  that  should 
be  included  in,  or  deleted  from,  the  questionnaire. 
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.  Eliminate  the  "yes  or  no"  response  items  and  replace  these  with  a 
more  meaningful  numerical  value  or  scale. 

.  Supplement  the  data  collected  by  questionnaire  with  program  listings, 
which  would  supply  more  accurate  data  for  several  items  now  in  the 
questionnaire. 

.  Collect  the  data  by  means  of  personal  interviews  rather  than  by 
mail  and/or  phone  to  reduce  the  number  of  misinterpretations  that 
occur  from  installation  to  installation.  Although  these  initial 
interviews  would  be  more  expensive,  they  would  considerably  reduce 
the  need  for  the  prolonged  and  costly  validation,  such  as  that 
carried  out  in  this  cycle. 

.  Attempt  to  include  some  form  of  direct  coding  for  questionnaire 
responses  to  allow  more  accurate  and  efficient  transfer  and  storage 
of  data  by  computer. 

Although  we  feel  that  the  above  recommendations  would  alleviate  many  of  the 
existing  problems,  it  would  be  virtually  impossible— at  present— to  design 
an  experiment  in  which  none  of  these  difficulties  were  present.  For  example, 
since  the  lack  of  standard  terminology  in  the  EDP  industry  abets  the  mis¬ 
understanding  and  misinterpretation  of  questionnaire  items,  which  in  turn 
leads  to  inaccurate  data  and  unreliable  results,  we  doubt  that  the  data 
validation  could  be  completely  avoided  with  any  collection  procedure. 

Thus,  the  recommendations  are  proposed  to  improve  our  methods,  and  not 
to  solve  the  difficulties  that  are  encountered  in  any  basic  research. 


23 

(page  2k  blank) 


APPENDIX  A 


BASIC  ASSUMPTIONS  OF  THE  RESEARCH  MODEL 


In  beginning  the  work  to  derive  the  cost  estimating  relationships  for  computer 
programming,  we  established  a  model  that  would  provide  a  scientific  frame  of 
reference  for  the  analysis.  This  model  has  been  used  in  the  work  during  all 
three  cycles  of  the  research  and  consists  of  the  following  assumptions: 

.  In  collecting  data  for  the  analyses,  we  have  assumed  that  computer 

programming  has  certain  common  characteristics  that  can  be  generalized, 
i.e.,  that  the  cost  factors  that  explain  the  variation  in  the  costs 
of  computer  programming  can  be  selected  from  a  comprehensive  collection 
that  has  been  divided  into  three  groups  corresponding  to  requirements, 
resources,  and  environment  for  computer  programming. 

.  These  cost  factors  can  be  formulated  into  items  for  a  questionnaire, 
and  the  same  set  of  questions  can  be  used  to  collect  data  on  all 
computer  programs. 

.  The  primary  costs,  such  as  manpower,  measured  in  man  months,  and 
computer  time,  measured  in  hours,  can  be  considered  as  dependent 
variables  that  can  be  predicted  by  a  linear  combination  of  cost 
factors  used  as  independent  variables.  Values  for  these  variables 
can  be  obtained  as  numerical  answers  to  items  in  a  survey  question¬ 
naire  to  be  completed  by  knowledgeable  individuals  associated  with 
a  particular  effort. 

.  The  analyses  performed  to  derive  estimating  equations  are  restricted 
to  program  production  costs,  i.e.,  those  that  are  incurred  in  program 
design,  code,  and  test  activities.  These  activities  include  asso¬ 
ciated  documentation  as  well  as  work  on  the  data  base.  (This  particular 
set  of  activities  was  chosen  because  they  appear  to  be  common  to  almost 
all  computer  programming  work. )  Therefore,  the  scope  of  the  work  to 
date  does  not  include  activities  that  may  constitute  a  more  generalized 
model  of  computer  programming  associated  with  large  information  process¬ 
ing  systems;  the  programming  activities,  for  example,  the  system  design 
and  analysis  that  may  precede  computer  program  design  and  test  of  the 
total  system  that  may  follow  test  of  the  computer  program  design,  and 
the  test  of  the  total  system  that  may  follow  test  of  the  computer 
program  components,  have  been  deliberately  excluded  to  help  us  collect 
a  large,  consistent  sample  of  data. 

.  Each  member  of  the  sample,  i.e.,  the  questionnaire  data  describing 
each  completed  programming  project,  is  referred  to  as  a  data  point. 

To  qualify  as  a  data  point,  the  data  must  be  for  a  programming  effort 
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that  resulted  in  the  smallest  set  of  instructions  (l)  whose  purpose 
is  defined  by  someone  other  than  the  programmer,  (2)  which  is 
deliverable  to  the  user  as  a  package,  and  (3)  which  is  loaded  into 
the  computer  as  a  unit  or  an  integral  part  of  a  system,  e.g.,  a 
subprogram,  to  achieve  the  stated  purpose. 

One  of  these  basic  assumptions  was  modified  in  conducting  the  analysis 
described  in  this  report.  Specifically,  the  assumption  that  all  computer 
programming  has  certain  generalizable  characteristics  regardless  of  applica¬ 
tion  and  resources  used  was  partially  abandoned  by  the  formation  of  subsamples 
that  differentiated  among  characteristics  of  computer  programs  and  their 
products.  In  the  first  two  phases  of  the  work  the  data  bases  were  too  small 
to  permit  extensive  analysis  of  subsamples.  However,  in  the  third  cycle,  we 
felt  that  we  had  enough  data  to  examine  cost  differences  for  several  types 
of  computer  programs  and  production  techniques  such  as  the  following: 

.  Utility  and  support  programs,  e.g.,  compilers,  executives,  etc. 

.  Process-oriented  (scientific)  computer  programs,  e.g.,  algebraic 
and  statistical  calculations. 

.  File-oriented  (business)  computer  programs,  e.g.,  payroll,  billings, 
etc. 

.  All  other  computer  programs  not  in  above,  such  as  command  and  control. 

.  Computer  programs  developed  with  machine-oriented  languages  as 
opposed  to  those  developed  with  procedure-oriented  languages. 

.  Computer  programs  produced  with  small,  medium  or  large  computers 
(with  size  determined  on  cost  basis). 

.  Computer  programs  which  were  produced  as  parts  of  a  larger  system, 
opposed  to  programs  developed  as  total  entities  in  themselves. 
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APPENDIX  B 


DATA  COLLECTION  AND  VALIDATION 


The  following  describes  the  design  of  the  questionnaire  and  the  type  of  sample 
acquired  as  well  as  how  the  data  were  collected  and  validated  in  the  third 
cycle. 

1.  Design  of  Questionnaire.  The  data  used  as  inputs  in  the  analysis  in  all 
three  phases  of  the  cost- estimation  work  have  been  collected  by  means  of  a 
questionnaire,  which  consists  of  items  called  cost  factors,  that  are  variables 
thought  to  affect  computer  programming  costs,  and  various  measures  of  cost. 

Numerous  cost  factors  were  identified  and  formulated  on  the  basis  of  previous 
surveys  of  program  development  experience.  Specifically,  the  factors  represent 
answers  by  managers  to  such  questions  as,  "Why  did  you  overrun  your  budget?" 
or,  "Why  did  your  program  cost  more  or  take  longer  to  develop  than  another 
program  that  appears  to  be  similar?"  These  factors  became  items  in  the  data 
collection  questionnaire,  and  subsequently,  variables  in  statistical  analyses. 

Although  some  revisions  were  made  after  the  first  cycle  and  during  the  second, 
most  of  the  items  in  the  questionnaire  for  the  third  cycle  were  the  same  as 
those  used  in  the  previous  two  cycles. 

The  presumed  factors  were  classified  into  the  following  logical  groupings: 

.  The  Job  to  be  done 

.  The  resources  that  are  expected  to  be  available 

.  The  nature  of  the  expected  working  environment 

These  groups  were  divided  into  seven  categories;  an  additional  section  for 
costs  comprised  the  following  eight  divisions  of  the  questionnaire: 

SUMMARY  OF  COSTS 

.  Cost  Measures.  Measures  of  cost  in  terms  of  resources  such  as  man 
months,  computer  hours,  and  elapsed  time. 

REQUIREMENTS  (the  job  to  be  done) 

.  Operational  Requirements  and  Design.  Operational  characteristics  of 
the  system  into  which  the  computer  program  will  fit  as  a  component, 
e.g.,  number  of  ADP  centers  in  system,  rating  of  information  system 
complexity. 
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.  Program  Design  and  Production.  Factors  in  the  design,  coding,  and 
test  of  the  computer  program,  e.g.,  number  of  classes  of  items  in  the 
data  base,  number  of  types  of  output  messages. 

RESOURCES 

.  Data  Processing  Equipment.  Characteristics  of  the  hardware  required 
to  produce  and  test  a  program,  including  all  input,  output,  and 
peripheral  equipment,  e.g.,  number  of  words  in  core  storage,  number 
and  types  of  i/o  equipment. 

.  Programming  Personnel.  Characteristics  of  the  personnel  needed  to 
completely  develop  the  computer  program,  e.g.,  number  of  programmers 
classified  as  coder,  programmer,  senior  programmer,  system  programmer; 
years  of  experience  for  each  category  of  programmer  with  language 
used,  computer  used,  and  specific  application. 

.  Utility  Computer  Programs.  Characteristics  of  the  computer  programs 
used  as  tools  to  produce  the  subject  computer  program,  e.g.,  program¬ 
ming  language  used  in  coding,  number  of  free  support  programs  available. 

.  Management  Procedures.  Factors  associated  with  the  plans,  policies, 
practices,  and  review  techniques  used  in  the  administration  of  all 
phases  of  program  development,  e.g.,  existence  of  a  documented 
management  plan  for  processing  of  program  design  changes  and 
standards  for  coding  and  flow  charting. 

.  Development  Environment.  Factors  describing  relationships  with 
external  organizations,  including  customers  and  other  contractors, 
e.g.,  number  of  agencies  concurring  on  design  specifications  and 
computer  facility  operated  on  the  basis  of  open  shop,  closed  shop, 
time- sharing. 

This  organization  was  intended  to  permit  separation  of  the  questionnaire  into 
sections,  so  that  each  section  could  be  easily  delegated  within  the  responding 
organization  to  the  personnel  most  qualified  to  complete  it.  Appendix  D 
contains  a  complete  set  of  the  factors  and  costs  used  in  the  questionnaire 
for  the  third  cycle,  as  well  as  the  data  gathered  in  the  latest  collection 
effort. 

The  questionnaires  used  for  data  collection  in  the  second  and  third  cycles 
were  modifications  of  those  used  in  the  first  and  second  cycles  respectively. 
Extensive  revision  was  not  possible  since  the  data  collected  by  past  question¬ 
naires  were  to  be  combined  with  data  from  the  new  collection  effort  in  order 
to  build  up  the  size  of  the  base.  Further,  the  data  collection  for  the  third 
cycle  was  begun  before  the  analysis  of  the  second  cycle  was  complete;  therefore, 
little  or  no  feedback  could  be  introduced  in  the  questionnaire  for  the  third 
collection  effort.  Despite  these  restrictions,  the  following  revisions 
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were  made  in  the  questionnaire  before  it  was  sent  to  the  Air  Force  and 
industrial  organizations : 

a.  Questions  that  were  found  to  yield  unreliable  answers  were  deleted. 

b.  Questions  were  amplified  to  gather  more  detailed  data. 

c.  Frequently  misinterpreted  questions  were  supplemented  with  definitions 
of  the  ambiguous  terms. 

d.  Comments  were  solicited  from  respondents  on  which  cost  factors  they 
deemed  most  important  to  cost  prediction. 

2.  Design  of  the  Sample.  As  in  the  previous  data  collection  efforts,  no 
rigorous  sample  design  was  used.  To  expedite  the  analysis  in  the  first  two 
iterations,  only  data  from  within  System  Development  Corporation  were  collected. 
The  third  cycle,  reported  here,  is  the  first  attempt  to  gather  and  analyze  data 
from  non-SDC  sources.  In  all  three  collection  efforts,  the  two  definitions 
outlined  in  Appendix  A  were  used  to  control  the  sample  data,  by  specifying 

(a)  the  scope  of  the  programming  process  and  (b)  what  constitutes  a  program 
data  point. 

3.  Validation  of  the  Data.  Each  completed  questionnaire  that  was  received 
became  a  component  of  a  data  matrix.  The  matrix  consisted  of  9b  columns 
representing  the  data  and  106  rows  representing  the  data  points  collected 
outside  of  SDC.  Every  questionnaire  was  examined,  and  if  an  item  was 

(a)  left  blank,  (b)  apparently  misinterpreted,  or  (c)  not  consistent  with 
other  items  in  the  particular  collection  form,  a  mark  was  placed  in  the 
appropriate  cell  of  the  matrix. 

The  completed  matrix  indicated  several  items  which  were  misinterpreted  or 
left  blank  frequently  enough  to  be  considered  unreliable.  Some  of  these 
variables,  e.g.,  number  of  instructions  discarded  due  to  operational  changes, 
were  dropped  from  further  consideration  in  the  analysis;  it  was  felt  that  even 
with  extensive  follow-up,  it  would  be  difficult  to  obtain  statistically  valid 
responses  for  these  items  due  to  the  unavailability  of  the  required  data. 

After  further  study  of  the  matrix,  we  began  the  validation  process  to  improve 
the  quality  of  the  data.  A  package  containing  the  following  items  was  sent 
to  each  respondent: 

a.  A  duplicate  copy  of  the  questionnaire  originally  completed  by  the 
respondent,  for  reference. 

b.  A  supplementary  form  containing 

(l)  clarification  and  definition  of  five  frequently  misinterpreted 
items,  and 
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(2)  two  additional  items  that  were  deleted  from  the  original 

questionnaire  but  were  later  found  to  be  needed  in  combining 
the  new  data  with  the  SDC  data. 

c.  A  list  of  questions  on  specific  items  in  that  particular  respondent's 
questionnaire . 

Each  contributor  was  then  contacted  by  telephone  to  discuss  details  of  his 
particular  problems.  In  most  cases,  questionable  items  were  corrected. 

However,  in  a  few  instances,  certain  problems  could  not  be  corrected,  either 
because  the  original  respondent  was  no  longer  available  or  the  existing  records 
could  not  provide  the  answers.  Five  data  points  were  finally  dropped  due  to 
the  general  unreliability,  or  unavailability  of  the  data. 

Therefore,  101  new  data  points  of  the  original  106  remained  to  be  combined 
with  the  Jb  data  points  from  SDC  used  in  the  second  cycle.  The  total  175 
data  points  became  the  inputs  to  the  statistical  analysis  for  the  third 
cycle.  On  the  basis  of  the  statistical  characteristics,  e.g.,  outliers,  six 
more  data  points  were  eliminated,  leaving  1 69  points  for  further  analysis. 
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APPENDIX  C 


STATISTICAL  METHODS 


1.  The  Linear  Multivariate  Regression  Method.  The  primary  statistical  tool 
employed  in  the  work  to  date  has  been  multivariate  regression  analysis. 
Mathematically,  this  method  involves  the  fitting,  by  least-squares  procedures, 
of  an  equation  of  the  form: 

Y  -  A  ♦  bA  ♦  ♦  ...bXn  -  A  uy, 

where: 

Y  =  The  value  of  the  cost  measure  to  be  estimated. 

A  =  A  constant. 

b^  =  The  relative  weight  assigned  to  the  "ith"  cost  predictor. 

X^  =  The  value  of  the  "ith"  cost  predictor. 

Each  estimate  is  subject  to  prediction  error  as  compared  to  actual  costs. 

This  error  is  determined  by  the  estimating  power  of  the  predictors  used  and 
the  interrelationships  between  them. 

The  statistical,  model  does  not  guarantee  meaningful  prediction  equations 
per  se.  This  meaningfulness  must  come  from  the  choice  of  the  predictor 
variables  used  and  their  definition  and  interpretation. 

2.  Formal  Analysis.  The  formal  analysis  of  the  research  consists  of  the 
following  two  main  tasks: 

.  Winnowing  of  variables 

.  Development  of  cost  estimating  equations 

Winnowing  consists  of  reducing  the  number  of  independent  variables  to  be 
considered  as  predictors  in  the  derivation  of  cost  estimating  equations. 

This  is  a  rather  difficult  task  when  the  number  of  available  predictors  is 
very  large.  The  following  steps  were  used  as  a  basis  for  reducing  the 
variables  to  be  considered: 

a.  Examination  of  Raw  Data.  The  responses  to  the  questionnaire  were 
tabulated  in  a  data  matrix  (see  Appendix  D)  in  which  each  column  (variable) 
was  examined  and  evaluated.  Six  of  the  original  variables  were  immediately 
rejected  for  one  or  more  of  the  following  reasons:  (l)  frequently  left  blank, 
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(2)  unavailability  of  data  for  validation,  (3)  apparently  ambiguous  question, 
(4)  lack  of  intuitive  appeal  based  on  judgment  and  previous  experience.  In 
addition,  five  data  points  were  found  to  have  numerous  questionable  values 
for  variables  that  could  not  be  resolved  by  follow-up  procedures.  These  were 
dropped  from  the  analysis. 

A  frequency  distribution  analysis  was  made  of  all  cost  and  predictor  variables 
and  six  more  data  points  were  dropped,  due  to  the  extremely  large  size  of 
their  costs.  Their  inclusion  would  have  biased  the  least  squares  computations 
in  their  favor  while  resulting  in  equations  of  little  predictive  value  for 
estimating  the  bulk  of  programming  efforts.  The  deletion  of  these  six  points 
resulted  in  a  considerable  reduction  in  the  range  of  all  cost  measures.  For 
instance,  before  these  points  were  eliminated,  the  range  of  man  months 
extended  from  1  to  1653;  after  elimination,  the  upper  boundary  of  the  range 
was  reduced  to  300  man  months  (see  Figure  1). 

b.  Scatterplot  Analysis.  Following  the  above  analysis,  machine  scatter- 
plots  (6)  were  obtained  for all  potential  predictors  against  the  four  primary 
cost  meas tires .  Visual  analysis  of  these  scatterplots  was  made  for  the  purpose 
of  detecting  discontinuities  and  other  unusual  characteristics  of  the  data. 
Such  conditions  were  noted  whenever  they  occurred,  and  this  information  was 
considered  in  the  predictor  winnowing  process. 

c.  Correlation  Analysis.  In  the  next  analysis,  a  complete  correlation 
matrix  was  computed  (6).  This  matrix  depicted  the  statistical  relationship 
of  each  predictor  with  every  other  predictor  and  with  each  cost  measure. 
Correlation  analysis  is  used  as  an  aid  in  winnowing  variables  by  allowing 
predictors  to  be  selected  that  have  high  correlations  with  cost  and  low 
correlations  with  each  other  (4).  All  of  the  above  considerations,  in 
conjunction  with  available  programming  management  experience  and  judgment, 
were  used  to  develop  a  winnowed  (i.e.,  reduced)  set  of  predictor  variables. 

d.  Regression  Analysis.  After  the  correlation  analysis,  there  were  still 
too  many  predictor  variables  to  allow  computation  of  meaningful  estimating 
equations.  Trial  multivariate  regression  analyses  are  now  being  used  to 
further  reduce  the  list  of  potential  predictors.  These  regression  solutions 
do  not  lead  directly  to  the  final  prediction  equations  but  they  do  allow  a 
more  complete  analysis  of  the  interaction  of  predictor  variables  when  used 

in  an  estimating  formula. 

In  using  regression  analysis  as  a  predictor  selection  device,  we  axe  examining 
the  computed  standard  regression  coefficient  (beta  coefficient)  for  each 
predictor  in  the  trial  solution  (5).  If  this  coefficient  is  very  low  or  if 
its  algebraic  sign  is  contrary  to  good  judgment,  the  variable  is  generally 
rejected.  In  a  few  instances,  variables  with  low  (but  meaningful)  beta 
coefficients  are  being  retained  due  to  their  extremely  high  intuitive 
appeal.  A  complete  series  of  trial  regression  runs  is  planned  with  various 
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combinations  and  selections  of  predictors  regressed  against  each  cost  measure. 
After  this  series  of  runs,  the  best  rational  and  statistical  solution  will  be 
chosen  for  each  cost  measure. 

3.  Development  of  Cost-Estimating  Equations.  A  multiple  regression  computer 
program  (7)  operating  in  conjunction  with  the  IBM  709^  computer  is  the  primary 
tool  vised  to  compute  the  many  trial  regression  solutions  required.  The 
program  computes  and  outputs  the  following  statistics:  weighted  regression 
coefficients,  means,  standard  deviations,  intercorrelations  for  the  predictors 
being  regressed,  the  standard  error  of  estimate,  the  coefficient  of  deter¬ 
mination,  the  multiple  correlation  coefficient,  and  the  regression  constants. 
The  program  also  includes  a  subsetting  option,  which  selects  the  best 
statistical  predictors  based  on  predictor  cost  correlations  and  inter¬ 
correlations.  When  this  option  is  used,  the  statistical  outputs  of  the 
program  are  augmented  by  the  following:  the  difference  in  the  multiple 
correlation  between  the  chosen  predictors  and  the  total  input  set  of 
predictors,  and  the  corresponding  F  ratio  relating  to  the  significance  of 
the  difference  between  the  variances  accounted  for  by  the  partial  set  and 
the  total  set  input. 

Of  the  statistics  output  by  the  program,  the  standard  error  of  estimate  and 
the  coefficient  of  determination  are  the  principal  indices  used  to  assess 
the  statistical  accuracy  of  the  estimating  equations.  The  standard  error 
of  estimate  is  a  measure  of  the  estimating  precision  of  the  equation,  and 
the  coefficient  of  determination  (the  multiple  correlation  squared)  indicates 
the  proportion  of  the  variance  in  the  cost  measure  that  is  accounted  for  by 
the  equation.  High  statistical  accuracy  for  an  estimating  equation  is 
indicated  by  a  low  standard  error  of  estimate  and  a  high  coefficient  of 
determination . 
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APPENDIX  D 


LIST  OF  QUESTIONNAIRE  ITEMS  AND  CORRESPONDING  DATA  MATRIX 


This  Appendix  contains  a  list  of  9^  variables  corresponding  to  the  items  in 
the  data  collection  questionnaire.  Also  included  are  the  values  submitted 
by  the  respondents  for  each  of  the  variables  and  the  coding  used  (if  any) 
in  recording  the  data. 

The  data  for  the  169  points  in  the  sample  are  presented  in  12  groups.  Each 
group  begins  with  a  fold-out  page  consisting  of  definitions  for  8  variables. 
Following  the  definitions,  there  are  4  pages  of  values  (169  points),  corre¬ 
sponding  to  the  variables  on  the  fold-out.  The  169  data  points  are  numbered 
in  ascending  (but  not  sequential)  order;  the  last  number  is  205- 

It  should  be  noted  that  several  variables  were  scaled,  e.g.,  recorded  in 
thousands.  Also,  blank  data  in  this  matrix  are  denoted  by  a  negative 
zero  (-0). 
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Sample  Number 


Average  Round-Trip  Distance  per  man  trip.  Entered  directly  from  questionnaire. 
If  distance  was  less  than  10  miles,  a  zero  vas  entered. 


Number  of  Man  Trips  required  for  concurrence  during  program  design,  code,  and 
test.  Entered  from  questionnaire.  If  the  average  round-trip  distance  was 
less  than  10  miles,  a  zero  was  entered. 


Computer  Hours  used  by  all  computers  implemented  in  program  design,  code,  and 
test. 


Months  Elapsed — completion  date  for  program  delivery  minus  start  date  for 
program  design.  At  the  time  of  program  delivery  the  program  is  ready  to  be 
installed  in  the  operational  computer  to  begin  system  test.  Coded  in  months. 


Man  Months,  including  man  months  expended  on  utility  and  executive  programs 
developed  specifically  for  the  data  point.  Entered  from  questionnaire. 


Man  Months  to  design,  code,  and  test  the  program  and  the  utility  programs  used 
in  production.  Entered  from  questionnaire. 


Man  Months  to  develop  utility  programs  used  in  program  design,  code. 
Entered  from  questionnaire. 


and  test. 


Man  Months,  not 
program  design. 


including 
code,  and 


man  months  to  develop  the  utility 
test.  Entered  from  questionnaire 


programs 


used  in 


Sample  Number 
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1 

2 

3 

k 

44 

53.000 

2.000 

0. 

0. 

45 

60.000 

0. 

0. 

0. 

46 

64.000 

1.000 

0. 

0. 

47 

66.000 

33.000 

0. 

0. 

48 

79.000 

12.000 

0. 

0. 

49 

96.000 

112.000 

0. 

0. 

50 

110.000 

23.000 

0. 

0. 

52 

121.000 

0. 

0. 

0. 

53 

130.000 

12.000 

0. 

0. 

54 

143.000 

245.000 

0. 

0. 

55 

183.000 

112.000 

0. 

0. 

56 

183.000 

9.000 

0. 

0. 

57 

185.000 

0. 

0. 

0. 

58 

190.000 

3.000 

0. 

0. 

59 

201.000 

112.000 

0. 

0. 

60 

207.000 

168.000 

0. 

0. 

61 

232.000 

245.000 

0. 

0. 

62 

256.000 

2.000 

0. 

0. 

63 

260.000 

5.000 

0. 

0. 

64 

300.000 

1.000 

0. 

0. 

71 

53.000 

1.000 

0. 

0. 

72 

121.000 

5.000 

0. 

0. 

80 

27.000 

0. 

0. 

0. 

84 

98.000 

3.000 

0. 

0. 

85 

13.000 

0. 

0. 

0. 

86 

50.000 

0. 

0. 

0. 

100 

0. 

0. 

4.000 

3.000 

101 

0. 

0. 

3.000 

2.000 

102 

0. 

0. 

4.000 

0. 

104 

0. 

0. 

4.000 

0. 

105 

0. 

0. 

1.000 

0. 

106 

0. 

0. 

2.000 

0. 

107 

0. 

0. 

13.000 

5.000 

109 

0. 

0. 

6.000 

0. 

110 

0. 

0. 

10.000 

0. 

111 

0. 

0. 

7.000 

0. 

112 

0. 

0. 

5.000 

1.000 

113 

0. 

0. 

2.000 

0. 

114 

0. 

0. 

33.000 

0. 

115 

0. 

0. 

5.000 

0. 

116 

0. 

0. 

1.000 

0. 

117 

0. 

0. 

3.000 

0. 

5 

6 

7 

8 

10.000 

274.000 

4.000 

3000.000 

44 

14.000 

550.000 

0. 

0. 

45 

9.000 

633.000 

0. 

0. 

46 

36.000 

2100.000 

3.000 

4260.000 

47 

6.000 

600.000 

0. 

0. 

48 

25.000 

300.000 

4.000 

1100.000 

49 

6.000 

600.000 

20.000 

250.000 

50 

19.000 

443.000 

5.000 

3000.000 

52 

18.000 

880.000 

16.000 

2000.000 

53 

19.000 

130.000 

0. 

0. 

54 

14.000 

195.000 

42.000 

3100.000 

55 

12.000 

581.000 

25.000 

3000.000 

56 

23.000 

427.000 

0. 

0. 

57 

29.000 

1420.000 

30.000 

1000.000 

58 

16.000 

320.000 

0. 

0. 

59 

25.000 

990.000 

30.000 

3370.000 

60 

21.000 

850.000 

10.000 

1000.000 

61 

14.000 

1291.000 

15.000 

3000.000 

62 

20.000 

1625.000 

21.000 

3000.000 

63 

16.000 

1095.000 

20.000 

3000.000 

64 

9.000 

274.000 

4.000 

3000.000 

71 

19.000 

443.000 

5.000 

3000.000 

72 

4.000 

222.000 

0. 

0. 

80 

26.000 

250.000 

420.000 

120.000 

84 

8.000 

91.000 

2.000 

1400.000 

85 

4.000 

100.000 

0. 

0. 

86 

2.000 

43.000 

12.000 

5.000 

100 

5.000 

8.000 

15.000 

1.000 

101 

5.000 

15.000 

6.000 

170.000 

102 

3.000 

6.000 

0. 

0. 

104 

1.000 

2.000 

0. 

0. 

105 

10.000 

4.000 

0. 

0. 

106 

3.000 

32.000 

3.000 

1100.000 

107 

5.000 

26.000 

0. 

0. 

109 

3.000 

3.000 

0. 

0. 

110 

5.000 

19.000 

35.000 

2.000 

111 

3.000 

3.000 

11.000 

2.000 

112 

2.000 

2.000 

0. 

0. 

113 

11.000 

53.000 

7.000 

16.000 

114 

14.000 

18.000 

0. 

0. 

115 

4.000 

4.000 

0. 

0. 

116 

10.000 

6.000 

0. 

0. 

117 

1 

2 

3 

4 

118 

0. 

0. 

5.000 

0. 

119 

0. 

0. 

1.000 

0. 

120 

0. 

0. 

1.000 

0. 

121 

0. 

0. 

82.000 

0. 

122 

0. 

0. 

6.000 

0. 

123 

0. 

0. 

5.000 

0. 

124 

0. 

0. 

3.000 

0. 

125 

0. 

0. 

2.000 

0. 

126 

0. 

0. 

1.000 

0. 

127 

0. 

0. 

2.000 

0. 

128 

0. 

0. 

3.000 

1.000 

129 

0. 

0. 

1.000 

0. 

130 

0. 

0. 

2.000 

0. 

131 

0. 

Q. 

4.000 

0. 

132 

0. 

0. 

15.000 

0. 

133 

0. 

0. 

2.000 

1.000 

134 

0. 

0. 

2.000 

0. 

135 

0. 

0. 

3.000 

0. 

136 

0. 

0. 

8.000 

0. 

137 

0. 

0. 

1.000 

0. 

138 

0. 

0. 

2.000 

0. 

139 

0. 

0. 

3.000 

0. 

140 

0. 

0. 

2.000 

0. 

141 

0. 

0. 

2.000 

0. 

14? 

0. 

0. 

21.000 

0. 

143 

0. 

0. 

15.000 

0. 

144 

0. 

0. 

5.000 

0. 

145 

0. 

0. 

63.000 

9.000 

146 

0. 

0. 

9.000 

0. 

147 

0. 

0. 

10.000 

1.000 

148 

0. 

0. 

166.000 

3.000 

149 

0. 

0. 

4.000 

0. 

150 

0. 

0. 

3.000 

1.000 

151 

0. 

0. 

2.000 

0. 

152 

0. 

0. 

2.000 

0. 

153 

0. 

0. 

1.000 

0. 

154 

0. 

0. 

4.000 

0. 

155 

0. 

0. 

16.000 

2.000 

156 

0. 

0. 

6.000 

0. 

157 

0. 

0. 

2.000 

0. 

158 

9. 

0. 

3.000 

0. 

159 

0. 

0. 

5.000 

0. 

5 

6 

7 

8 

5.000 

5.000 

0. 

0. 

118 

5.000 

5.000 

0. 

0. 

119 

4.000 

3.000 

0. 

0. 

120 

9.000 

28.000 

8.000 

1100.000 

121 

5.000 

2.000 

0. 

0. 

122 

15.000 

6.000 

0. 

0. 

123 

4.000 

7.000 

0. 

0. 

124 

2.000 

3.000 

0. 

0. 

125 

3.000 

5.000 

0. 

0. 

126 

4.000 

11.000 

0. 

0. 

127 

2.000 

6.000 

6.000 

-0. 

128 

1.000 

2.000 

0. 

0. 

129 

2.000 

1.000 

0. 

0. 

130 

4.000 

7.000 

0. 

0. 

131 

21.000 

63.000 

0. 

0. 

132 

1.000 

34.000 

1.000 

2400.000 

133 

3.000 

5.000 

0. 

0. 

134 

3.000 

13.000 

0. 

0. 

135 

8.000 

5.000 

0. 

0. 

136 

3.000 

2.000 

18.000 

1178.000 

137 

5.000 

1.000 

0. 

0. 

138 

3.000 

1.000 

0. 

0. 

139 

8.000 

15.000 

0. 

0. 

140 

3.000 

4.000 

2.000 

280.000 

141 

8.000 

40.000 

30.000 

300.000 

142 

5.000 

100.000 

0. 

0. 

143 

3.000 

20.000 

2.000 

2500.000 

144 

26.000 

100.000 

30.000 

950.000 

145 

10.000 

80.000 

10.000 

15.000 

146 

5.000 

16.000 

4.000 

290.000 

147 

9.000 

1918.000 

104.000 

100.000 

148 

4.000 

11.000 

1.000 

125.000 

149 

8.000 

6.000 

32.000 

22.000 

150 

4.000 

10.000 

0. 

0. 

151 

6.000 

8.000 

0. 

0. 

152 

1.000 

4.000 

0. 

0. 

153 

4.000 

12.000 

0. 

0. 

154 

20.000 

150.000 

0. 

0. 

155 

14.000 

30.000 

0. 

0. 

156 

2.000 

2.000 

0. 

0. 

157 

3.000 

10.000 

0. 

0. 

158 

3.000 

11.000 

0. 

0. 

159 

1 

2 

3 

k 

5 

6 

7 

8 

160 

0. 

0. 

2.000 

0. 

3.000 

3.000 

0. 

0. 

160 

161 

0. 

0. 

1.000 

0. 

1.000 

6.000 

0. 

0. 

161 

16? 

0. 

0. 

3.000 

1.000 

3.000 

11.000 

0. 

0. 

162 

163 

0. 

0. 

1.000 

0. 

1.000 

7.000 

0. 

0. 

163 

164 

0. 

0. 

2.000 

0. 

4.000 

5.000 

0. 

0. 

164 

165 

0. 

0. 

1.000 

0. 

3.000 

15.000 

0. 

0. 

165 

166 

0. 

0. 

30.000 

0. 

5.000 

130.000  * 

0. 

0. 

166 

167 

0. 

0. 

65.000 

0. 

18.000 

420.000 

2.000 

1200.000 

167 

169 

0. 

0. 

3.000 

0. 

3.000 

7.000 

0. 

0. 

168 

169 

0. 

0. 

1.000 

0. 

3.000 

14.000 

0. 

0. 

169 

170 

0. 

0. 

6.000 

4.000 

4.000 

50.000 

0. 

0. 

170 

171 

0. 

0. 

2.000 

0. 

3.000 

5.000 

0. 

0. 

171 

173 

0. 

0. 

4.000 

0. 

3.000 

10.000 

0. 

0. 

173 

174 

0. 

0. 

52.000 

0. 

27.000 

700.000 

0. 

0. 

174 

175 

0. 

0. 

109.000 

0. 

13.000 

1000.000 

10.000 

400.000 

175 

176 

0. 

0. 

58.000 

1.000 

9.000 

400.000 

2.000 

360.000 

176 

177 

0. 

0. 

95.000 

4.000 

20.000 

930.000 

10.000 

350.000 

177 

179 

0. 

0. 

79.000 

0. 

21.000 

200.000 

0. 

0. 

179 

180 

0. 

0. 

100.000 

0. 

21.000 

200.000 

0. 

0. 

180 

181 

0. 

0. 

152.000 

17.000 

25.000 

1022.000 

0. 

0. 

181 

18? 

0. 

0. 

135.000 

0. 

13.000 

450.000 

20.000 

400.000 

182 

184 

0, 

0. 

7.000 

0. 

11.000 

100.000 

2.000 

300.000 

184 

185 

0. 

0. 

90.000 

0. 

22.000 
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Sample  Number 

16  Design  Characteristics  of  the  Program  Data  Point.  Coded:  direct  translation  of 
manual  tasks  to  automatic  functions  =  0;  relatively  few,  well  defined  functions 
to  be  automated  =  1;  many  clear  and  well  defined  functions  to  be  automated  =  2; 
many  undefined  and  unstructured  functions  to  be  automated  =  3* 

15  Operational  Characteristics  of  the  Program  Data  Point.  Coded:  no  on-line, 
real-time  operation  =  0;  mixture  of  on-line  and  off-line  operations  =  1; 
mainly  on-line,  real-time  operation  =  2. 


14  Number  of  computer-based  centers  with  which  the  Program  Data  Point  must 
communicate.  Entered  from  questionnaire. 


13  Extent  of  response  time  requirements  imposed  by  the  organizational  users. 
Coded:  greater  than  1  day  =  0;  24  hours  or  less  =  1;  1  hour  or  less  =  2; 
real  time  =  3* 

12  With  how  many  Organizational  Users  (interfaces)  must  the  Program  Data  Point 
communicate?  Entered  from  questionnaire. 


11  How  well  were  the  Program  Data  Point's  operational  requirements  known  and 
documented?  Coded:  in  detail  =  0;  in  outline  =  1;  vaguely  =  2. 

10  Participation  of  Programming  Organization  in  the  requirements  analysis  and/or 
operational  design  of  the  Program  Data  Point.  The  requirements  analysis  is 
conducted  to  specify  the  performance  requirements  of  the  informat ion -processing 
system.  These  performance  requirements  are  input  to  the  operational  design 
activity,  which  indicates  how  the  information  processing  needs  will  be 
satisfied.  Coded:  extensive  participation  =  0;  intermittent  participation  =  1; 
minimal  participation  =  2. 

9  Need  for  innovation  in  the  information  processing  system.  Innovation  means 
either  a  new  data  processing  application  of  a  known  programming  technique 
and/or  a  new  technique  for  a  known  application.  New  means  to  the  people 
involved.  Coded:  Yes  =  1;  No  =  0. 
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Sample  Number 


Number  of  Input  Message  Types.  Message  types  could  be 
messages  (these  may  be  variations  of  data  within  their 
Entered  from  questionnaire. 


unique  displays  or 
specific  formats). 


Number  of  Classes  of  Items  in  the 
of  items  such  as  names,  salaries, 
for  which  there  are  many  items  or 


Data  Base.  Classes  means  categories  of  types 
states,  or  any  characteristics  of  information 
entries.  Entered  from  questionnaire. 


Number  of  Words  in  the  Data  Base.  Data  Base  is  the  subset  of  tables  that 
describe  the  environment  of  the  problem  that  the  program  is  solving  and/or 
files  to  be  processed.  Entered  from  questionnaire,  in  thousands. 


Number  of  subroutines,  i.e.,  a  set  of  well  defined  instructions  to  carry  out  a 
mathematical  or  logical  operation.  Entered  from  questionnaire. 


Percent  delivered  object  instructions  to  total  number  of  object  instructions 
written  or  generated  specifically  for  this  Program  Data  Point.  Derived  from 
questionnaire.  Coded  in  percent. 


Number  of  source  instructions  written  in  procedure -oriented  language  (POL). 
Entered  from  questionnaire,  in  thousands. 


Number  of  source  instructions  written  in  machine -oriented  language  (MOL). 
Entered  from  questionnaire,  in  thousands. 


Total  Number 
Program  Data 


of  object  instructions  written  or  generated  specifically  for  this 
Point.  Entered  from  questionnaire,  in  thousands. 


Sample  Number 


265  0.265  0.  20.000  6.000  0.075  5.000  1.000 


~*rNjf*>^-m'nr*-oooo^<Nf^'t,in.or'-aoo^o-'*njm^-in>or-oooo~*<\im^-in'Or*-oocno~«c\) 


oooooooooooooooo 

oooooooooooooooo 

oooooooooooooooo 


o  o 
o  o 
o  o 


o  o  o  o 
o  o  o  o 
o  o  o  o 


o  o  o  o 
o  o  o  o 
o  o  o  o 


o  o  o  o  o  o 
o  o  o  o  o  o 
o  o  o  o  o  o 


ONHHHH^m^moooo^flONOO 

<\>  Csi  r\i 


oooo®ooonononooo(mo^o(\j 
cm  r\j  o  in  — •  o  o  o  in 


ooooooooooooooooooooooooooooooooooooooooo 

ooooooooooooooooooooooooooooooooooooooooo 

ooooooooooooooooooooooooooooooooooooooooo 


H-t^whNOH«oooin«o*0'flirirMnpninf\j<ooO'ON'flONOi 
*-«  in  <\i  r-  <m  m  rvj  ^  in 


irosooinmvoosooor'- 
o  rsi  <\j  ^  o 

O  — t  <\J 


-•inooonjinoooooooof^moooooooooo'vtooooornooooinoo 

OO'OONH^OOOONNNOOsOOONMO'tOOOinsm^OOO^'flOOOHinO 

OO^<M®Off'OOOOOOOOO-H'flinOONO'tm>tOrtN(MfflHOO'0HrtMO^« 

00~tin000^^00000000-«000— I— <00000~»-<00r\jm><00— »OOOro 

mo  in 


ooooooooooooooooooooooooooooooooooooooooo 

ooooooooooooooooooooooooooooooooooooooooo 

ooooooooooooooooooooooooooooooooooooooooo 


i}1  ^  ^  4-  m  — •  m^^-»rn 


oooooooo 

oooooooo 

oooooooo 


oooooooo 

oooooooo 

oooooooo 


oooooooo 

oooooooo 

oooooooo 


ooooooooo 

oocoooooo 

ooooooooo 


oooooooo 

oooooooo 

OOOOOOOO 


ONin^osMrs>ooft-t®fOcinfOOs 

O^m®O00fl000C0fflN«t«0mffl00f\JC0O^ 


^^irOO'tOfONS^O'irOOinO'POSPON 

'fO'O'OO'tONCOnNSNin^NNnoDN^ 


OOOOO^  00  O 

— •  in  o  oo  h-  o 

^  m  in  oo 


OOOO— •OOOOOOOOOOOOOOOOOOOOOCOCOOOCOOOfnoCCnj 


o  o  o  in  moo 

in  m  ^  m  >*  o 

— t  m  m  •  O 


ooomcooomootvo'ococ'oooooooooooso 

ooosDooomoooincC'rooocoooocotnooooorr' 

inftinOMMino«m'C'C®oins»ooir,ocONOcO'too 


Oooooo(vjccooHN'^MNpHo^o^NN^(viminos>f®inoNO'flmifi^^o 

—  n}-  ^4 


oooinoinaomococooinocooinoon'  o'oooococcoooocooooo 
inm  —  rnoo-^ooooooooooooomaoomao^oooh-ooooomcoooomo 
HHn,nmNO'0^i/Mrin®inoM(M®0'flin'0'fl®oifN®oinoooNoccstooc 


OOOOO«tONN^^'t^<\iHMM^O^O^NNm(VimiTi'HN>t>00in'0NOs00'in«C^-J 


^Nm>tin>oNooc7'0 


Nm^in<oNfloo'o^Nm^in«CNooo'o^(Mm^m»osoco' 


O  •— *  (Nj 


46 


008 


IT 

18 

19 

20 

44 

10.000 

10.000 

0. 

100.000 

45 

35.000 

35.000 

0. 

100.000 

46 

14. 100 

14. 100 

0. 

100.000 

47 

40.000 

0. 

30.000 

31.000 

48 

25.000 

25.000 

0. 

65.000 

49 

16.900 

16.900 

0. 

100.000 

50 

21.590 

21.590 

0. 

97.000 

5? 

19.000 

19.000 

0. 

76.000 

51 

10.000 

10.000 

0. 

38.000 

54 

26.900 

26.900 

0. 

100.000 

55 

58.300 

58.300 

0. 

100.000 

56 

21.000 

21.000 

0. 

60. 000 

57 

25.000 

0. 

4.000 

100.000 

58 

15.000 

15.000 

0. 

50.000 

59 

31.500 

31. 500 

0. 

76. 000 

60 

56.000 

56.000 

0. 

30.000 

61 

45.400 

45.400 

0. 

100.000 

62 

30.000 

30.000 

0. 

77.000 

61 

50.000 

50.000 

0. 

100.000 

64 

17.000 

17.000 

0. 

86.000 

71 

7.500 

7.500 

0. 

100.000 

72 

16.000 

16.000 

0. 

64. 000 

80 

17.000 

17. 000 

0. 

11.000 

84 

12.898 

1.000 

3.000 

13.000 

85 

8.400 

1.  100 

2.  700 

88.000 

86 

3.000 

3.000 

0. 

50.000 

100 

1.298 

1.115 

0. 

100.000 

101 

1.064 

0.630 

0. 

100.000 

102 

0.655 

0.211 

0. 

100.000 

104 

4.200 

2.400 

0. 

100.000 

105 

1.634 

1.  105 

0. 

100.000 

106 

0.658 

0.287 

0. 

51.000 

107 

30.072 

9.750 

0. 

100.000 

109 

83.336 

12.000 

10.000 

100.000 

110 

8.280 

4.200 

0. 

100.000 

111 

1.172 

1.086 

0. 

100.000 

112 

1.893 

1.655 

0. 

100.000 

113 

2.589 

0.030 

0.720 

100.000 

114 

0.850 

0.850 

0. 

100.000 

115 

1.242 

0.447 

0.214 

17.000 

116 

0.648 

0.281 

0.029 

18.000 

117 

1.941 

0.026 

0.154 

25.000 

21 

22 

23 

24 

14.000 

0.985 

6.000 

6.000 

44 

16.000 

100.000 

255.000 

8.000 

45 

8.000 

1.000 

20.000 

4.000 

46 

12.000 

20.000 

2048.000 

3.000 

47 

1.000 

0.010 

1.000 

8.000 

48 

5.000 

0.363 

4.000 

21.000 

49 

17.000 

4.000 

30.000 

2.000 

50 

19.000 

1.200 

6.000 

3.000 

52 

35.000 

3.400 

12.000 

16.000 

53 

14.000 

3.980 

132.000 

5.000 

54 

1.000 

10.000 

10.000 

0. 

55 

36.000 

1.990 

7.000 

3.000 

56 

6.000 

2.000 

8.000 

1.000 

57 

71.000 

5.000 

25.000 

31.000 

58 

16.000 

1 10.000 

102.000 

9.000 

59 

98.000 

2.200 

24.000 

25.000 

60 

21.000 

95.000 

105.000 

10.000 

61 

36.000 

2.090 

7.000 

3.000 

62 

31.000 

0.501 

1.000 

0. 

63 

36.000 

2.190 

7.000 

3.000 

64 

10.000 

0.400 

5.000 

4.000 

71 

17.000 

8.525 

26.000 

4.000 

72 

3.000 

0.800 

1.000 

0. 

80 

32.000 

57. 142 

256.000 

24.000 

84 

2.000 

1.740 

150.000 

0. 

85 

9.000 

0.800 

80.000 

22.000 

86 

34.000 

0.068 

5.000 

8.000 

100 

27.000 

0.069 

2.000 

6.000 

101 

8.000 

240.000 

16.000 

9999.000 

102 

4.000 

1.200 

21.000 

10.000 

104 

18.000 

2200.000 

60.000 

4.000 

105 

0. 

0.370 

29.000 

1.000 

106 

100.000 

0.869 

250.000 

16.000 

107 

0. 

15.000 

20.000 

7.000 

109 

2.000 

1.200 

400.000 

12.000 

110 

45.000 

0.02  0 

10.000 

13.000 

111 

36.000 

4.370 

12.000 

2.000 

112 

3.000 

38.339 

25.000 

1500.000 

113 

12.000 

1400.000 

50.000 

6.000 

114 

2.000 

-0. 

4.000 

1.000 

115 

4.000 

34.560 

13.000 

13.000 

116 

0. 

-0. 

5.000 

2.000 

117 

'll 


* 


17 

18 

19 

20 

118 

1.220 

0.333 

0.174 

47.000 

119 

2.055 

Oi  025 

0.275 

32.000 

120 

2.509 

1.744 

0. 

66.000 

121 

29.000 

28.000 

0. 

100.000 

122 

43.500 

8.000 

0. 

52.000 

123 

26.000 

26.000 

0. 

100.000 

124 

0.316 

0.030 

0.070 

2.000 

125 

2.075 

1.  522 

0. 

6.000 

126 

1.314 

0.472 

0.059 

14.000 

127 

0.217 

0. 

0.052 

1.000 

128 

0.363 

0.303 

0. 

25.000 

129 

2.105 

0.395 

0.328 

45.000 

130 

0.800 

0.800 

0. 

100.000 

131 

-0. 

0. 

1.321 

100.000 

132 

19.462 

0.178 

4.163 

66.000 

133 

0.980 

0.980 

0. 

100.000 

134 

1.716 

1.697 

0. 

100.000 

135 

4.B50 

0.003 

1.300 

97.000 

136 

1.748 

1.748 

0. 

100.000 

137 

1.000 

0.500 

0. 

100.000 

138 

0.264 

0.  145 

0. 

20.000 

139 

0.473 

0.402 

0. 

32.000 

140 

2.150 

1.150 

0. 

100.000 

141 

0.550 

0.550 

0. 

100.000 

14? 

15.000 

15.000 

0. 

100.000 

143 

6.000 

0.250 

3.500 

50.000 

144 

5.000 

4.500 

0. 

100.000 

145 

35.000 

25.000 

1.500 

88.000 

146 

1.200 

1.200 

0. 

100.000 

147 

3.875 

3.300 

0. 

100.000 

148 

20.000 

20.000 

0. 

25.000 

149 

0.725 

0.725 

0. 

96.000 

150 

2.000 

2.000 

0. 

100.000 

151 

5.875 

5.875 

0. 

100.000 

152 

2.750 

2.750 

0. 

62.000 

153 

2.150 

2.  150 

0. 

56.000 

154 

3.000 

3.000 

0. 

80.000 

155 

7.076 

7.076 

0. 

25.000 

156 

5.528 

5.528 

0. 

55.000 

157 

6.500 

0. 

0.450 

100.000 

158 

3.650 

3.650 

0. 

68.000 

159 

14.300 

0. 

1.900 

100.000 

\ 


21 

22 

23 

2k 

0. 

180.000 

23.000 

1.000 

118 

1.000 

7.479 

105.000 

1.000 

119 

12.000 

-0. 

3.000 

1.000 

120 

100.000 

229.600 

111.000 

10.000 

121 

2.000 

-0. 

25.000 

2.000 

122 

2.000 

53250.000 

99999.000 

2.000 

123 

3.000 

0.  165 

12.000 

7534.000 

124 

2.000 

17668.356 

323.000 

323.000 

125 

6.000 

7438.859 

57.000 

9999.000 

126 

32.000 

42.000 

155.000 

9999.000 

127 

1.000 

12247.053 

192.000 

9999.000 

128 

3.000 

1373.725 

25.000 

9999.000 

129 

4.000 

43.200 

14.000 

2.000 

130 

4.000 

60.000 

60.000 

18.000 

131 

45.000 

0.329 

145.000 

32.000 

132 

12.000 

3.568 

5.000 

3.000 

133 

4.000 

0.527 

29.000 

7.000 

134 

60.000 

0.  152 

45.000 

6.000 

135 

12.000 

0.578 

37.000 

3.000 

136 

10.000 

0.012 

12.000 

2.000 

137 

20.000 

0.136 

13.000 

2.000 

138 

65.000 

0.260 

24.000 

6.000 

139 

7.000 

213.  125 

1.000 

4.000 

140 

1.000 

40.000 

16.000 

5.000 

141 

25.000 

-0. 

-0. 

1.000 

142 

90.000 

0.  100 

7.000 

25.000 

143 

36.000 

1.000 

50.000 

10.000 

144 

180.000 

5.000 

1000.000 

20.000 

145 

35.000 

50.000 

24.000 

21.000 

146 

35.000 

3.  800 

20.000 

6.000 

147 

160.000 

100.000 

120.000 

4.000 

148 

2.000 

2.400 

12.000 

5.000 

149 

14.000 

0.  148 

37.000 

3.000 

150 

5.000 

8.600 

12.000 

3.000 

151 

2.000 

97. 700 

30.000 

2.000 

152 

0. 

228.000 

19.000 

6.000 

153 

0. 

300.000 

150.000 

4.000 

154 

60.000 

673.000 

56.000 

20.000 

155 

70.000 

200.000 

112.000 

15.000 

156 

6.000 

0.125 

25.000 

10.000 

157 

21.000 

855.000 

19.000 

20.000 

158 

12.000 

1900.000 

50.000 

6.000 

159 

IT 

18 

19 

20 

21 

22 

23 

24 

160 

10.050 

0. 

1.028 

100.000 

20.000 

64.500 

17.000 

6.000 

160 

161 

2.965 

0. 

0.740 

100.000 

0. 

200.000 

19.000 

6.000 

161 

162 

5.266 

0. 

1.464 

100.000 

0. 

973.000 

40.000 

1.000 

162 

163 

0.714 

0. 

0.216 

100.000 

0. 

90.600 

3.000 

1.000 

163 

164 

16.000 

0. 

0.790 

100.000 

18.000 

0.600 

3.000 

3.000 

164 

165 

6.500 

6.500 

0. 

100.000 

0. 

0.150 

40.000 

8.000 

165 

166 

82.000 

69.000 

0. 

53.000 

400.000 

331.000 

90.000 

24.000 

166 

167 

217.000 

110.000 

5.000 

100.000 

400.000 

130.000 

90.000 

101.000 

167 

168 

16.767 

0. 

2.646 

100.000 

14.000 

415.000 

15.000 

9.000 

168 

169 

8.040 

0. 

0.328 

100.000 

6.000 

0.104 

57.000 

4.000 

169 

170 

6.500 

6.500 

0. 

80.000 

1.000 

1184.000 

145.000 

5.000 

170 

171 

2.700 

2.700 

0. 

57.000 

16.000 

1200.000 

30.000 

9999.000 

171 

173 

7.000 

0. 

0.350 

100.000 

0. 

430.000 

43.000 

14.000 

173 

174 

18.000 

18.000 

0. 

100.000 

0. 

-0. 

-0. 

-0. 

174 

175 

12.000 

12.000 

0. 

100.000 

25.000 

-0. 

-0. 

1.000 

175 

176 

9.500 

9.500 

0. 

100.000 

-0. 

-0. 

-0. 

-0. 

176 

177 

30.000 

30.000 

0. 

100.000 

125.000 

-0. 

-0. 

8.000 

177 

179 

3.100 

3.  100 

0. 

100.000 

100.000 

-0. 

-0. 

-0. 

179 

180 

1.000 

1.000 

0. 

100.000 

25.000 

-0. 

-0. 

2.000 

180 

181 

26.297 

26.297 

0. 

100.000 

47.000 

-0. 

-0. 

2.000 

181 

182 

12.000 

12.000 

0. 

100.000 

220.000 

0.125 

6.000 

3.000 

182 

184 

3.000 

3.000 

0. 

100.000 

10.000 

0.300 

40.000 

50.000 

184 

185 

10.000 

7.000 

0. 

100.000 

25.000 

-0. 

-0. 

1.000 

185 

186 

30.000 

25.000 

0. 

100.000 

20.000 

-0. 

-0. 

2.000 

186 

187 

4.500 

4.000 

0. 

100.000 

25.000 

-0. 

-0. 

-0. 

187 

188 

27.000 

22.000 

0. 

100.000 

25.000 

-0. 

-0. 

1.000 

188 

189 

18.000 

15.000 

0. 

100.000 

20.000 

-0. 

-0. 

-0. 

189 

190 

10.650 

9.  800 

0. 

93.000 

52.000 

2.100 

40.000 

15.000 

190 

191 

48.000 

40.000 

0. 

100.000 

25.000 

-0. 

-0. 

1.000 

191 

192 

0.200 

0.200 

0. 

60. 000 

27.000 

153.000 

45.000 

2.000 

192 

193 

1.906 

1.630 

0. 

99.000 

30.000 

-0. 

-0. 

6.000 

193 

194 

49.800 

0.630 

2.300 

100.000 

70.000 

-0. 

-0. 

-0. 

194 

195 

3.225 

2.025 

0. 

100.000 

80.000 

-0. 

-0. 

3.000 

195 

196 

2.603 

0. 

0.685 

100.000 

0. 

0.05  8 

58.000 

4.000 

196 

197 

0.920 

0.900 

0. 

94.000 

25.000 

18.000 

100.000 

1.000 

197 

198 

3.000 

0. 

1.000 

65.000 

140.000 

200.000 

300.000 

8.000 

198 

199 

25.850 

25.000 

0. 

53.000 

68.000 

6058.000 

78.000 

60.000 

199 

200 

0.979 

0.695 

0. 

82.000 

16.000 

0. 157 

20.000 

3.000 

200 

201 

9.000 

9.000 

0. 

100.000 

10.000 

5.000 

5.000 

-0. 

201 

202 

6.342 

0. 

1.467 

100.000 

1.000 

12.000 

12.000 

16.000 

202 

203 

18.000 

0. 

1.500 

100.000 

2.000 

4.900 

20.000 

5.000 

203 

204 

5.035 

0. 

1.610 

-0. 

41.000 

14.000 

23.000 

8.000 

204 

205 

11.000 

10.000 

0. 

100.000 

57.000 

3.500 

9.000 

4.000 

205 

«P  9  »  *1-  »|llf 


Sample  Number 


32  Percent  Input/Output  Instructions  to  perform  data  acceptance  and  output 
formatting.  Coded  in  percent. 


31  Percent  Mathematical  Instructions  devoted  to  evaluating  and  computing  algebraic, 
mathematical,  geometric,  and  trigonometric  formulas.  Coded  in  percent. 


30  Percent  Clerical  Instructions,  e.g.,  bookkeeping,  sorting,  searching,  and  file 
maintenance  instructions.  Coded  in  percent. 


29  Complexity  of  Communication,  referring  to  interprogram  communications  problems. 
Coded:  less  than  10$  of  the  program  design  devoted  to  communication  problems  =  0; 
10$  to  50$  of  the  program  design  devoted  to  communication  problems  =1;  more 
than  50$  of  the  program  design  devoted  to  communication  problems  =  2. 


28  Stability  of  Program  Design.  Coded:  initial  design  carried  through  without 
change  =  0;  few  changes  to  initial  program  design  =  1;  frequent  changes  to 
program  design  =  2;  initial  program  design  almost  completely  revised  =  3* 


27  Average  Number  of  Output  Items  per  message  type.  Entered  from  questionnaire. 


26  Number  of  Output  Message  Types.  Entered  from  questionnaire. 


25  Average  Number  of  Input  Items  per  input  message  type.  Entered  from 
questionnaire . 
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0. 

30.000 
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tTi.-iiAL 


Sample  Number 


Total  Number  of  Pages  of  Internal  Documentation  =  number  of  types  of  internal 
documentation  x  average  number  of  pages /internal  document. 


Number  of  Types  of  Internal  Documentation,  i.e. ,  distinct  documents  for  use  of 
programming  organization.  Entered  from  questionnaire. 


MOL  versus  POL.  Coded:  MOL  =  1;  POL  =  0.  MOL  uses  machine -oriented  assembly 
symbolic  language  source  statements;  POL  uses  procedure -oriented  or  compiler 
language  for  source  statements. 


Was  timing  constraint  a  factor  in  program  design,  code,  and  test?  Coded: 
Yes  =  1;  No  =  0. 


Was  insufficient  memory  a  factor  in  program  design,  code,  and  test?  Coded: 
Yes  =  1;  No  =  0. 


Average  Frequency  of  Operation.  Coded:  not  applicable  =  0;  less  than  l/month 
weekly  to  monthly  inclusive  =  2;  24  hours  to  weekly  inclusive  =  3;  daily  =  4; 
utility  or  on-line  (including  compilers)  =  5* 


Average 
than  15 


Operate  Time 
minutes  =  2; 


of 

15 


Program  Data 
minutes  to  1 


Point.  Coded:  not  applicable  =  3; 
hour  =  1;  greater  than  1  hour  =  3* 


less 


Number  of  Conditional  Branches 
Entered  from  questionnaire,  in 


written  specifically  for  this  Program 
thousands . 


Data 


Point 


Sample  Number 


000  5.000  0.  0.  1.000  8.000  100.000 
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0. 

0. 

0. 

1.000 

168 

169 

2.000 

1.000 

0. 

8.000 

0. 

0. 

0. 

3.000 

169 

170 

7.000 

840.000 

0. 

8.000 

0. 

0. 

1.000 

2.000 

170 

171 

2.000 

6.000 

0. 

8.000 

0. 

0. 

0. 

3.000 

171 

173 

1.000 

4.000 

0. 

8.000 

0. 

0. 

0. 

2.000 

173 

174 

5.000 

1500. 000 

1.000 

80.000 

1.000 

1.000 

1.000 

1.000 

174 

175 

7.000 

525.000 

1.000 

8.000 

0. 

0. 

1.000 

2.000 

175 

176 

3.000 

90.000 

1.000 

32.000 

1.000 

1.000 

1.000 

2.000 

176 

177 

3.000 

300.000 

1.000 

16.000 

1.000 

1.000 

1.000 

1.000 

177 

179 

4.000 

500.000 

1.000 

32.000 

0. 

0. 

1.000 

0. 

179 

180 

4.000 

100.000 

1.000 

32.000 

0. 

0. 

1.000 

0. 

180 

181 

3.000 

690.000 

1.000 

64.000 

0. 

0. 

1.000 

2.000 

181 

182 

3.000 

240.000 

1.000 

64.000 

1.000 

1.000 

1.000 

1.000 

182 

184 

5.000 

250.000 

1.000 

16.000 

1.000 

1.000 

1.000 

2.000 

184 

185 

2.000 

100.000 

1.000 

16.000 

1.000 

1.000 

1.000 

1.000 

185 

186 

2.000 

100.000 

1.000 

16.000 

1.000 

0. 

1.000 

1.000 

186 

187 

3.000 

150.000 

1.000 

16.000 

1.000 

1.000 

1.000 

1.000 

187 

188 

2.000 

100.000 

1.000 

16.000 

1.000 

1.000 

1.000 

1.000 

188 

189 

3.000 

150.000 

1.000 

16.000 

1.000 

1.000 

1.000 

1.000 

189 

190 

20.000 

1000.000 

1.000 

8.000 

0. 

0. 

1.000 

2.000 

190 

191 

3.000 

300.000 

1.000 

16.000 

1.000 

1.000 

1.000 

1.000 

191 

192 

0. 

0. 

1.000 

80.000 

0. 

0. 

0. 

2.000 

192 

193 

3.000 

90.000 

1.000 

16.000 

0. 

0. 

0. 

3.000 

193 

194 

4.000 

600.000 

1.000 

32.000 

0. 

0. 

1.000 

2.000 

194 

195 

5.000 

5.000 

1.000 

160.000 

0. 

1.000 

0. 

2.000 

195 

196 

11.000 

44.000 

1.000 

100.000 

1.000 

1.000 

0. 

2.000 

196 

197 

1.000 

8.000 

1.000 

16.000 

0. 

0. 

0. 

2.000 

197 

198 

3.000 

60.000 

1.000 

80.000 

0. 

1.000 

1.000 

2.000 

198 

199 

1.000 

60.000 

1.000 

160.000 

0. 

0. 

0. 

2.000 

199 

200 

2.000 

30.000 

1.000 

80.000 

1.000 

1.000 

0. 

3.000 

200 

201 

0. 

0. 

1.000 

60.000 

1.000 

1.000 

1.000 

0. 

201 

20? 

1.000 

92.000 

1.000 

32.000 

0. 

0. 

0. 

1.000 

202 

203 

1.000 

70.000 

1.000 

32.000 

0. 

0. 

0. 

1.000 

203 

204 

1.000 

20.000 

0. 

65.000 

0. 

0. 

1.000 

3.000 

204 

205 

4.000 

180.000 

1.000 

32.000 

0. 

0. 

0. 

2.000 

205 

Sample  Number 


6b  Average  Type  III  programmer  experience  with  production  language  vised  in 
developing  the  Program  Data  Point.  Coded  in  months. 


6 3  Average  Type  II  programmer  experience  with  production  language  used  in 
developing  the  Program  Data  Point.  Coded  in  months. 


62  Average  Type  I  programmer  experience  with  production  language  used  in 
developing  the  Program  Data  Point.  Coded  in  months. 


6l  Maximum  Number  of  Type  IV  Programmers  assigned  at  one  time;  a  type  IV 

programmer  formulates  and  plans  new  program  system  applications,  is  highly 
creative  in  designing  and  developing  major  computer  program  systems. 
Entered  from  questionnaire. 

60  Maximum  Number  of  Type  III  Programmers  assigned  at  one  time;  a  type  III 
programmer  conceives,  develops  and  improves  large,  complex  computer 
programs.  Entered  from  questionnaire. 


59  Maximum  Number  of  Type  II  Programmers  assigned  at  one  time;  a  type  II 

programmer  develops  programs  to  solve  well  defined  problems;  prepares  flow 
charts,  writes  instructions,  tests  programs,  modifies  established  computer 
programs.  Entered  from  questionnaire. 

58  Maximum  Number  of  Type  I  Programmers  assigned  at  one  time;  a  Type  I  programmer 
writes  machine  language  instructions  from  flow  charts,  helps  prepare  flow 
charts  and  test  programs.  Entered  from  questionnaire. 


57  Were  there  any  data  processing  components  to  be  used  by  the  Program  Data  Point 
being  developed  concurrently  with  the  program?  Coded:  Yes  =1;  No  =  0. 


Sample  Number 
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57 

58 

59 

6o 

44 

0. 

2.000 

4.000 

1.000 

45 

1.000 

0. 

0. 

2.000 

46 

0. 

0. 

5.000 

2.000 

47 

0. 

0. 

1.000 

2.000 

48 

0. 

2.000 

4.000 

3.000 

49 

1.000 

2.000 

2.000 

2.000 

50 

1.000 

2.000 

6.000 

7.000 

52 

0. 

1.000 

4.000 

2.000 

53 

0. 

0. 

6.000 

4.000 

54 

1.000 

0. 

5.000 

3.000 

55 

0. 

0. 

8.000 

3.000 

56 

1.000 

7.000 

10.000 

6.  000 

57 

0. 

2.  000 

3.000 

2.000 

58 

0. 

10.000 

50.000 

15.000 

59 

1.000 

0. 

18.000 

2.000 

60 

0. 

4.000 

10.000 

4.000 

61 

1.000 

2.000 

10.000 

0. 

62 

1.000 

5.000 

13.000 

12.000 

63 

0. 

1.000 

6.000 

4.000 

64 

1.000 

3.000 

15.000 

15.000 

71 

0. 

2.000 

4.000 

1.000 

72 

0. 

1.000 

4.000 

2.000 

80 

0. 

0. 

6.000 

3.000 

84 

0. 

1.000 

2.000 

5.000 

85 

0. 

0. 

1.000 

1.000 

86 

0. 

0. 

3.000 

4.000 

100 

0. 

0. 

2.000 

1.000 

101 

0. 

0. 

1.000 

1.000 

102 

0. 

0. 

1.000 

0. 

104 

0. 

0. 

2.000 

0. 

105 

0. 

0. 

0. 

1.000 
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0. 

0. 

1.000 

0. 

107 

0. 

4.000 

4.000 

1.000 

109 

1.000 

1.000 

3.000 

1.000 

no 

0. 

0. 
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0. 
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0. 
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0. 
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0. 
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0. 

0. 
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0. 

6l 

62 

63 

6k 

0. 

0. 

0. 

0. 

44 
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0. 

0. 

48.000 

45 

1.000 

0. 

12.000 

12.000 

46 

1.000 

0. 

0. 

0. 

47 

2.000 

6.000 

12.000 

24.000 

48 

0. 

0. 

0. 

0. 

49 
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0. 

0. 

0. 

50 

1.000 

9.000 

9.000 

9.000 

52 

4.000 

0. 

12.000 

36.000 

53 

2.000 

0. 

24.000 

36.000 

54 

1.000 

0. 

30.000 

48.000 

55 
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0. 

12.000 

24.000 

56 

1.000 

12.000 

24.000 

36.000 

57 

15.000 

6.000 

24.000 

36.000 

58 

0. 

0. 

0. 

0. 

59 

0. 

12.000 

36.000 

60.000 

60 

0. 

4.000 

24.000 

0. 

61 

2.000 

0. 

24.000 

72.000 

62 

1.000 

0. 

0. 

0. 

63 

3.000 

0. 

36.000 

84.000 

64 

0. 

0. 

0. 

0. 

71 
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0. 

0. 
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0. 

0. 

0. 

0. 
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0. 

0. 

0. 

0. 

158 

0. 

0. 

1.000 

0. 

159 

0. 

0. 

2.000 

1.  000 

6l 

62 

63 

64 

2.000 

0. 

0. 

2.000 

118 

1.000 

0. 

12.000 

0. 

119 

1.000 

0. 

0. 

24.000 

120 

0. 

0. 

0. 

0. 

121 

1.000 

0. 

24.000 

0. 

122 

1.000 

0. 

36.000 

0. 

123 

3.000 

0. 

36.000 

36.000 

124 

1.000 

0. 

12.000 

24.000 

125 

0. 

0. 

0. 

24.000 

126 

0. 

0. 

0. 

12.000 

127 

2.000 

0. 

30.000 

0. 

128 

0. 

0. 

12.000 

24.000 

129 

1.000 

0. 

0. 

0. 

130 

0. 

0. 

0. 

0. 

131 

0. 

0. 

24.000 

84.000 

132 

0. 

0. 

24.000 

60.000 

133 

0. 

0. 

8.000 

8.000 

134 

1.000 

0. 

0. 

36.000 

135 

1.000 

0. 

12.000 

60.000 

136 

0. 

0. 

24.000 

0. 

137 

0. 

0. 

0. 

99.000 

138 

0. 

0. 

0. 

99.000 

139 

1.000 

0. 

0. 

0. 

140 

1.000 

0. 

0. 

0. 

141 

0. 

0. 

0. 

12.000 

142 

0. 

0. 

24.000 

48.000 

143 

0. 

0. 

0. 

0. 

144 

2.000 

24.000 

48.000 

72.000 

145 

0. 

0. 

36.000 

3.000 

146 

0. 

0. 

0. 

12.000 

147 

3.000 

12.000 

24.000 

24.000 

148 

0. 

0. 

2.000 

0. 
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1.000 

0. 

0. 
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1.000 

0. 

0. 
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1.000 

0. 

0. 

0. 
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0. 
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24.000 

0. 
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0. 

0. 
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0. 
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0. 

0. 
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0. 
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0. 

0. 

0. 

157 

0. 

0. 
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0. 
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0. 
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57 

58 

59 

6o 

6l 

62 

63 

6k 
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0. 

0. 

1.000 

0. 

0. 

0. 

36.000 

0. 
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0. 

0. 

0. 
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0. 

0. 

0. 

7.000 
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0. 

0. 

1.000 

0. 

0. 

0. 

24.000 

0. 
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0. 

0. 

0. 

1.000 

0. 

0. 

0. 

18.000 

163 
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0. 

0. 

0. 

0. 

1.000 

0. 

0. 

0. 
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165 

0. 

0. 

0. 

0. 

1.000 

0. 

0. 

0. 

165 

166 

0. 

6.000 

2.000 

0. 

1.000 

0. 

0. 

0. 

166 

167 

0. 

0. 

2.000 

2.000 

1.000 

0. 

0. 

0. 

167 

168 

0. 

0. 

0. 

1.000 

0. 

0. 

0. 

36.000 

168 

169 

0, 

0. 

0. 

0. 

1.000 

0. 

0. 

0. 

169 

170 

0. 

0. 

1.000 

0. 

1.000 

0. 

12.000 

0. 

170 

171 

0. 

0. 

1.000 

0. 

1.000 

0. 

36.000 

0. 

171 

173 

0. 

0. 

1.000 

0. 

1.000 

0. 

12.000 

0. 

173 

174 

0. 

0. 

2.000 

1.000 

3.000 

0. 

12.000 

24.000 

174 

175 

0. 

1.000 

2.000 

6.000 

0. 

14.000 

36.000 

60.000 

175 

176 

1.000 

0. 

4.000 

2.000 

1.000 

0. 

6.000 

12.000 

176 

177 

0. 

5.000 

2.000 

0. 

0. 

12.000 

12.000 

0. 

177 

179 

1.000 

0. 

0. 

4.000 

1.000 

0. 

0. 

0. 

179 

180 

1.000 

2.000 

2.000 

2.000 

1.000 

0. 

0. 

0. 

180 

181 

1.000 

1.000 

8.000 

1.000 

1.000 

0. 

24.000 

60.000 

181 

192 

0. 

3.000 

6.000 

3.000 

0. 

3.000 

3.000 

3.000 

182 

184 

0. 

0. 

3.000 

1.000 

0. 

0. 

0. 

0. 

184 

185 

0. 

0. 

5.000 

1.000 

0. 

0. 

12.000 

24.000 

185 

186 

0. 

0. 

6.000 

2.000 

0. 

0. 

12.000 

24.000 

186 

187 

0. 

0. 

5.000 

2.000 

0. 

0. 

12.000 

24.000 

187 

188 

0. 

0. 

5.000 

2.000 

0. 

0. 

12.000 

24.000 

188 

189 

0. 

0. 

5.000 

2.000 

0. 

0. 

12.000 

24.000 

189 

190 

1.000 

0. 

3.000 
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Sample  Number 


72  Maximum  number  of  programmers  assigned  to  the  project  at  one  time.  Entered 
from  questionnaire. 


71  Total  number  of  programmers  who  worked  for  the  duration  of  the  project. 
Entered  from  questionnaire. 


70  Total  number  of  programmers  who  participated  in  program  design.  Coded  in 
months . 


69  Average  Type  IV  programmer  experience  with  the  application  represented  by  the 
Program  Data  Point.  Coded  in  months. 


68  Average  Type  III  programmer  experience  with  the  application  represented  by  the 
Program  Data  Point.  Coded  in  months. 


67  Average  Type  II  programmer  experience  with  the  application  represented  by  the 
Program  Data  Point.  Coded  in  months. 


66  Average  Type  I  programmer  experience  with  the  application  represented  by  the 
Program  Data  Point.  Coded  in  months. 


65  Average  Type  IV  programmer  experience  with  production  language  used  in 
developing  the  Program  Data  Point.  Coded  in  months. 
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80  Was  the  Program  Data  Point  developed  at  a  site  other  than  the  operational 
location?  Coded:  Yes  =1;  No  =  0. 


79  Was  time-sharing  implemented  in  program  production?  Coded:  Yes  =  1;  No  = 


78  Was  the  production  computer  facility  operated  on  the  basis  of  an  open  shop 
or  a  closed  shop?  Coded:  open  shop  =  0;  closed  shop  =  1. 


77  Was  the  production  computer  operated  by  an  organization  other  than  the 
Program  Data  Point  developer?  Coded:  Yes  =1;  No  =0. 


76  Estimated  customer  experience  and  knowledge  concerning  the  development  of 
automatic  data  processing  systems.  Coded:  extensive  =  0;  limited  =  1; 
none  =  2. 


75  Number  of  agencies  whose  concurrence  was  required  on  operational  design 
specifications.  Entered  from  questionnaire. 


74  Implementation  of  Management  Procedures. 


Coded  in  number  of  no  replies. 


73 


Sample  Number 
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Was  the  Program 
interfaces  with 


Data  Point  an  independent  program 
other  programs?  Coded:  Yes  *  1; 


in  which  there  are  no 
No  =  0. 


Total  object  instructions  generated  from  procedure-oriented  source  statements. 
Entered  from  questionnaire,  in  thousands. 


Cost  of  developmental  computer, 
large-scale 
small-scale 


($750,000  and  up)  = 
(less  than  $100,000) 


based  on  equivalent  purchase 
0;  medium-scale  ($100,000  to 
=  2. 


cost.  Coded: 

$7^9,000)  =  1; 


Was  the  Program  Data  Point  produced  at  a  military  installation?  Coded: 
Yes  =  1;  No  =  0. 


Was  the  Program  Data  Point  written  at  SDC?  Coded:  Yes  =1;  No  =  0. 


Does  the  Program  Data  Point  represent  a  command  and  control  application? 
Coded:  Yes  =1;  No  =  0. 
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APPENDIX  E 


LIST  OF  FORMED  VARIABLES 


Two  types  of  variables  were  used  as  input  to  the  statistical  analysis: 

(a)  those  obtained  directly  from  the  questionnaire,  and  (b)  those  that  were 
formed  by  various  combinations  of  the  variables  in  the  collection  form. 

This  Appendix  contains  the  list  of  formed  variables  used  in  the  analysis. 

The  variables  are  of  several  forms,  e.g.,  logarithmic  transformations,  ratios, 
machine  capability,  etc.,  all  created  in  an  effort  to  evaluate  their  effects 
on  programming  cost. 

Several  of  these  variables  were  found  to  influence  costs  and  are  included 
in  the  winnowed  (reduced)  list  of  variables  in  Appendix  F.  The  variables 
that  appeared  in  the  questionnaire  are  listed,  with  the  corresponding  data, 
in  Appendix  D. 
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Name  of  Formed  Variable 


Formation 


Fernbach*s  Ranking  of  Machine  Capability 


Shaw's  Ranking  of  Machine  Capability* 

Total  Man  Months  (not  including  man 
months  to  develop  utility  programs  used 
in  production) 

Total  Pages  of  Documentation 


LoglO  [50  (number  of  bits  in  work  +  10  x  machine  add  time)] 


-jwc&f-  [1°%  “  ■“»*» 

Man  Months,  exclusive  of  utility  programs  +  Man  Months  including 
utility  programs  +  Man  Months  for  executive  and  utility  programs 
developed  specifically  for  data  point. 

Number  of  pages  of  internal  documentation  +  number  of  pages  of  external 
documentation. 


Number  of  Different  Types  of  Documents 


Object  Documentation  Rate 


Source  Documentation  Rate 


Production  Rate  (Object) 


Number  of  types  of  internal  documents  +  number  of  types  of  external 
documents . 

Total  pages  of  documentation/total  object  instructions  generated  for 
specific  data  point. 

Total  pages  of  documentation/ (total  MOL  source  instructions  written  + 
total  POL  source  instructions  written) 

Number  of  Object  Instructions  Generated  for  Specific  Data  Point 

Total  Man  Months 


Production  Rate  (Source) 


Total  MOL  Source  Instr.  Written  +  Total  POL  Source  Instr.  Written 

Total  Man  Months 


POL  Usage  Rate 


_ Total  POL  Source  Instr.  Written _ 

Total  MOL  Source  Instr.  Written  +  Total  POL  Source  Instr.  Written 


Computer  Usage  Rate  (Object) 


_ Total  Computer  Hours _ 

Number  of  Object  Instructions  Generated  for  Specific  Data  Point 


Computer  Usage  Rate  (Source) 


_ Total  Computer  Hours _ 

Total  MOL  Source  Instr.  Written  +  Total  POL  Source  Instr.  Written 


POL  Expansion  Ratio 


MOL  Expansion  Ratio 


Number  of  Object  Instructions  Generated  from  POL  Source  Statements 

Total  POL  Source  Instructions  Written 

(Number  Object  Instr.  Generated  for  Specific  Data  Point)  -  (Number 
Object  Instr.  Generated  from  POL  Source  Instructions) _ 

Total  MOL  Source  Instructions  Written 


♦For  a  full  discussion  of  this  ranking  scheme,  see  SP-24l8/000/00,  Cost  Estimation  for  Computer  F^ogr^m  Development : 
A  Progress  Report  and  an  Evaluation,  G.  F.  Weinwurm. 


Name  of  Formed  Variable 


Proportion  of  Senior  Programmers 

Average  Programmer  Experience  with 
Development  Language 


Average  Programmer  Experience  with 
Program  Application 


Proportion  of  Programmers  Participating 
in  Design 

Naperian  Logarithm  of  Average  Round 
Trip  Distance/Trip 

Logarithm  of  Number  of  Words  in  Data 
Base 

Logarithm  of  Number  of  Classes  of  Items 
in  the  Data  Base 

Square  Root  of  Number  of  Classes  of  Items 
in  the  Data  Base 

Total  Message  Types 

Square  Root  of  Total  Pages  of  External 
Documentation 

Average  Programmers  per  Month 
Programmer  Continuity 


Index  of  Programmer  Continuity 


Formation 


Maximum  Number  of  Type  III  and  Type  IV  Programmers  Assigned 
Maximum  Number  of  Types  I,  II,  III,  and  IV  Programmers  Assigned 

(Maximum  Number  of  Type  I  Programmers  x  Average  Months  Experience 
with  Development  Language)  +  (Maximum  Number  of  Type  II  Programmers  x 
Average  Months  Experience  with  Development  Language)  +  (Maximum  Number 
of  Type  III  Programmers  x  Average  Months  Experience  with  Development 
Language)  +  (Maximum  Number  of  Type  IV  Programmers  x  Average  Months 
Experience  with  Development  Language) _ 

Maximum  Number  of  Programmers  Assigned 

(Maximum  Number  of  Type  I  Programmers  x  Average  Months  Experience 
with  Program  Application)  +  (Maximum  Number  of  Type  II  Programmers  x 
Average  Months  Experience  with  Program  Application  )  +  (Maximum  Number 
of  Type  III  Programmers  x  Average  Months  Experience  with  Program 
Application)  +  (Maximum  Number  of  Type  IV  Programmers  x  Average 
Months  Experience  with  Program  Application _ 

Maximum  Number  of  Programmers  Assigned 

Number  of  Programmers  Participating  in  Design 

Maximum  Number  of  Programmers  Assigned 

In  (Average  Round-Trip  Distance) 


Log^^  (Number  of  Words  in  Data  Base) 


Logic  (Number  of  Classes  of  Items  in  the  Data  Base) 


v Number  of  Items  in  Data 


Number  of  Input  Message  Types  +  Number  of  Output  Message  Types 
^Total  Pages  of  External  Documentation 

Total  Man  Months/Months  Elapsed 

Number  of  Programmers  for  Duration  of  Project 
Maximum  Number  of  Programmers  Assigned  at  One  Time 

Average  Men  per  Month/Maximum  Number  of  Programmers  Assigned  at  One  Time 


Name  of  Formed  Variable 


Formation 


Estimated  Elapsed  Time  at  Maximum 
Staffing 

Number  of  Input  Variables  to  Memory 

Object  Instructions  to  Memory 

Correction  Factor  for  Effective  Number 
of  Programmers 

Total  Number  of  Input  Variables 
Total  Number  of  Output  Variables 

Logarithm  of  Number  of  Subroutines 
Logarithm  of  Input  Message  Types 
Logarithm  of  Output  Message  Types 

Total  Source  Instructions 
Source  MOL  Production  Rate 


Total  Man  Months/Maximum  Number  of  Programmers  Assigned  at  One  Time 

Number  of  Classes  of  Items  in  Data  Base 
Core  Size  of  Developmental  Computer 

Number  of  Object  Instructions  Generated  for  Specific  Data  Point 

Core  Size  of  Developmental  Computer 

Maximum  Number  of  Programmers  Assigned  at  One  Time  [.4  +  .6 
(Continuity  of  Programmers)2] 

Number  of  Input  Message  Types  x  Average  Number  of  Input  Items  per  Type 

Number  of  Output  Message  Types  x  Average  Number  of  Output  Items  per 
Type 

Logio  (Number  of  Subroutines) 

Logio  (input  Message  Types) 

Log^0  (Output  Message  Types) 

Number  of  MOL  Source  Instructions  4  Number  of  POL  Source  Instructions 
Number  of  MOL  Source  Instructions/Total  Man  Months 


APPENDIX  F 


CORRELATION  OF  SELECTED  PREDICTORS  WITH  COST  MEASURES, 
OBJECT  PRODUCTION  RATE,  AND  OBJECT  COMPUTER  USAGE  RATE 


This  Appendix  contains  the  correlation  coefficients  of  selected  predictors, 
resulting  from  the  first  winnowing  phase,  with  the  major  cost  measures,  object 
production  rate  and  object  computer  usage  rate  (computer  hours/lOOO  object 
instructions).  The  cost  variables  axe  identified  in  column  headings;  the 
selected  independent  variables  (i.e.,  cost  factors)  are  noted  by  titles. 

The  variables  in  this  Appendix  were  selected  on  the  basis  of  intuitive 
Judgment,  previous  experience  and  correlation  with  the  cost  measures. 

The  correlation  coefficient  is  a  measure  of  the  statistical  relationship 
between  variables.  For  a  sample  size  of  169,  the  null  correlation,  i.e., 
the  correlation  that  could  occur  by  chance  two  thirds  of  the  time  without 
any  relationship  between  two  variables,  is  equal  to  +.07 6.  This  means  that 
if  we  conclude  that  a  statistical  relationship  exists  between  variables  with 
a  correlation  coefficient  of  +.15  or  greater,  our  conclusions  will  be  incorrect 
only  5  percent  of  the  time. 
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Selected  Predictors 

Months 

Elapsed 

Computer 

Hours 

Number  of 

Object  Instructions 

Man  Months 

Object 

Production  Rate 

Object  Computer 

Usage  Rate 

Number  of  Man  Trips 

29 

21 

7 

23 

-10 

14 

Innovation 

13 

6 

19 

6 

9 

5 

Operational  Requirements  Known 

0 

5 

0 

8 

-7 

2 

Number  of  Organizational  Users 

2 

-3 

24 

-5 

1 

-8 

Response  Time  Requirements 

44 

52 

7 

42 

-26 

^3 

Number  of  ADP  Centers 

11 

8 

0 

15 

-14 

9 

Operational  Characteristics 

11 

0 

-9 

-4 

-8 

-8 

Design  Characteristics 

13 

-8 

11 

6 

7 

-13 

Source  Instructions  (MOL) 

44 

44 

85 

58 

1 

1 

Source  Instructions  (POL) 

27 

23 

28 

0 

22 

-3 

Number  of  Subroutines 

26 

14 

62 

20 

-1 

1 

Number  of  Classes  of  Items  in  the  Data  Base 

7 

-3 

5 

-4 

18 

-5 

Stability  of  Design 

22 

18 

0 

20 

0 

13 

Complexity  of  Design 

10 

l4 

-l4 

11 

-14 

20 

Percent  Clerical  Instructions 

-4 

2 

7 

-6 

6 

-1 

Percent  Mathematical  Instructions 

0 

-2 

-2 

7 

-1 

-8 

Percent  Input/Output  Instructions 

-7 

-11 

-15 

-9 

-7 

-1 

Percent  Logical  Control  Instructions 

14 

13 

10 

13 

-6 

9 

Percent  Self-Checking  -  FIX  Instructions 

-4 

-4 

-3 

-l 
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